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As  a  result  cf  growing  demands  for  Automated  Data 
frocessirg  at  the  Navy  Stock  Points  and  Inventory  Cciticl 
Points,  lcng  range  jlans  are  reing  developed  around  the 
Stock  feint  Logistics  Interface  Communications  Snvir cement 
(SPIICI)  cencept.  Irotlens  and  opportunities  are  invclved 
kith  designing  and  using  distributed  systems.  This  thesis 
investigates  the  area  of  data  dictionary/directory  systems 
fcith  special  focus  cr  distributed  systems  and  attempts  to 
cutlire  the  benefits  icr  the  SELICE  system  from  the  use  of  a 
data  dictionary/ directory  system.  Interface  consideratiens 
letween  data  dictionary/directory  system  [DDS)  and  neigh- 
toring  mcdules  are  also  discussed. 
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I.  IHIECEUC1I0N 


A.   SPLICE  AND  DATA  IICTIONAEY 

lie  SPLICE  (Stock  Feint  Logistics  Integrated 
Communication  Environ  lent)  concept  comes  as  a  result  ci  the 
alwavs  growing  demards  of  the  U.S  Navy  for  automated  data 
processing  [fief-  1]  ard  inventory  control  at  various  feints. 
A  design  and  implementation  strategy  is  necessary  cased  in 
distributed  architecture  for  a  local  area  network  (IAN;  . 

SPIICE  is  designed  to  increase  ADP  facilities  of  the 
existing  Navy  stock  point  and  inventory  control  pcirt. 
Eecause  the  current  Uniform  Automated  Data  Processing 
Systen-Stock  Points  cannot  support  the  growing  requirements 
for  automated  data  processing  (ALP)  without  a  total  rede- 
sign, an  effort  has  teen  undertaken  to  improve  the  system  in 
the  short  and  long  term  £Ref-  1]-  Two  major  objectives  are 
behind  tie  SPLICE  development; 

1-  Ic  increase  CEI  display  terminals  so  users  can  access 
interactively  the  system's  data  base. 

2.  Ic  standardi2€  the  various  current  interfaces  across 
the  £1   supply  sites. 

Ihe  design  approach  first  starts  from  the  designing  of 
the  legical  or  virtual  Local  Area  Network  (LAN) ,  by  speci- 
fying all  tte  functional  modules,  their  characteristics,  and 
the  communication  prctocols  without  focusing  on  the  hardware 
characteristics.  A  later  phase  of  the  SPLICE  project  will 
anticipate  the  mapping  of  the  virtual  LAN  requirements  onto 
a  physical  local  network- 


lh€  following  functional  modules  are  involved  in  the 
develop  Kent  of  the  system. 

-  Iccal   communications     (1C) 

-  National    communications    (NC) 

-  Ercnt-End    processing     <IEE) 

-  Terniial    managenent    (1M) 

-  Data   base    nanageEent     (IEI?) 

-  Session    services    (SS) 

-  feripheral   management     (EM) 

-  Eescuice    allocation    (EA) 

This  IAN  design  provides  for  distributed  control  rut 
does  net  trcvj.de  for  the  distribution  of  data  bases  wittir.  a 
IAN.  Th€  data  bases  of  the  SEIICE  system  are  geographically 
distributed  over  a  wide  area  and  for  the  purpose  of  Eain- 
taining  the  integrity  of  the  system,  the  data  base  functiens 
are  centralized  within  each  IAN.  A  DBMS  module  for  the 
systeir  nust  at  least  provide  dictionary,  integrity, 
recovery,  guery  language,  and  security  features  as  well  as 
compatibility  with  existing  CCEOI  programs. 

The  functions  of  the  DBM  nodule  would  be: 

Catalog,  to  naintain  a  catalog  of  file  names  and 
status  (raae,  open  or  closed,  size,  physical  address  of 
file , physic al  address  cf  index,  application  used  in,  date 
entered  intc  system,  expiration  date  if  any,  location  cf 
backup  copy,  foriat,  access  restrictions). 

-  Operations,  under  a  menu  selection  scheme  to  perform 
various  functions  (retrieve  and  display  a  record,  update 
specified  fields  of  a  record,  delete  a  reccrd,  insert  a 
record,  print  a  file,   print  a  record  or  specified  fields  of 
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a  record,  answer  specified  gueries  and  display  and  print  the 
results)  . 

-  Dictionary  for  defining  and  characterizing  the  data 
elemerts.  The  dictionary  must  be  integrated  with  the  EEES. 
Ihis  will  contribute  to  data  integrity  and  consistercy 
throughout  the  systen  and  should  also  be  of  great  assistance 
in  designing  report  formats. 

With  this  improved  design  it  is  believed  that  the  SPLICE 
systen  will  provide  economical  and  responsive  support  capa- 
bilities among  the  62  different  geographical  locations,  each 
having  a  different  mix  of  application  and  terminal 
requirements. 

lie  SflJCE  functional  design  approach  suggests  devel- 
oping several  functional  modules,  distributed  in  miniccm- 
puters  throughout  the  IAN  with  the  necessary  communications 
to  support  them  [fief-  2].  Ihis  design  provides  for  higher 
systen  availability  than  the  centralized  approach  since 
functional  nodules  can  be  ncved  from  one  physical  node  to 
another  without  changing  their  logical  addresses  [Eef.  3]- 
At  the  time  there  exist  no  exact  methods  for  designing 
distriruted  systems  ard  so  an  objective  of  the  NPS  research 
program  for  SPLICE  is  to  advance  knowledge  about  distriruted 
systens  and  to  increase  understanding  of  how  distriruted 
systems  nust  be  desigred  in  crder  to  operate  effectively. 

Distriruted  systens  have  problems  associated  with  their 
design  that  need  solutions  in  particular  areas  [Eef.  4  pp 
2]-  Ihe  distributed  system  must  provide  the  ability  for  the 
user  to  ccnmunicate  and  access  information  across  the  62 
local  networks  inteiccnnec ted  by  the  Defense  Data  Network 
(DDN) .  It  must  be  possible  for  the  user  at  Naval  Supply 
Center  (NSC)  Oakland  to  access  the  Inventory  Control  Point 
(ICP)  datatase  at  Mechanicsturg  in  the  same  way  as  the  local 
database  at  Oakland  £Eef.  4  ]. 
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lie    data  dictionary    must   provide   su^ort    to   the  above   by 

uniquely      laming      and      identifying   objects      in      the  overall 

SPLICI   system.         In    the   case    of      a    message    which    is  destined 

to   another      local   network,         the    dictionary      can    te  used   to 

obtain    tie      physical    destination      address    with      the  help      of 
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Figore    1-  1        Network   Services    Directory   and   Dictionary. 


Session  Services  module  (Figure  1.1)  .  For  od ject  laming 
and  addressing  and  scitware  maintenance,  the  data  dicticnary 
can  help  ry  storing  all  the  name-to-address  mapping  and 
routing  information.  The  data  dictionary  can  also  be  used 
to  specify  task  reguirements  for  the  user  terminal 
processes.  The  data  dictionary  in  a  distributed  environment 
will  cooperate  closely  with  the  session  services  module 
which  prcvides  assistance  to  the  user  terminal  processes  in 
carrying  cut  their  tasks.  Ihus  a  distributed  operating 
system  must  provide,  in  addition  to  other  functions,  the 
ability  tc  access  effectively  the  dicticnar y/directcry 
system  (figure  1.2  ficm  Eef .  4)  . 

Major  systems  of  the  SPIICI  application  environment  are 
the  Integrated  Disbursement  and  Accounting  (IDA) ,  Automated 
Procurement  and  Date  Entry  (Af  ALE) ,  Uniform  Automated  Data 
Processing  System-Stcck  Points  (UADPS-SP) ,  and  Logistics 
Data  System  Trident  IIS.  £ach  of  the  above  systems  has  its 
cwn  elements,  files,  programs,  transactions,  users  and 
reports  £Bef.  4]. 

It  is  vital  for  the  system  tc  manage  all  the  resources 
efficiently  and  the  distributed  environment  makes  this  jcn 
more  difficult.  A  data  dictionary/directory  system  (DDS) 
seems  tc  be  one  approach  to  data  design  and  managing  prcblem 
solution.  For  the  centrali2ed  database  environment  three 
aspects  are  emphasized  [Eef.  5^. 

-lie  software  interfaces  between  the  D/D  system  and 
ether  software  packages 

-Ihe  convert  functions  of  the  D/D    system 

-Ihe  environmental  dependency  between  the  D/D  system  and 
a  datarase  management  system  (DBMS). 

For  the  distributed  database  environment,  as  in  the  case 
of  SPIICI,   there  must  be  extensions  to  the  centralized  D/D, 
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additiccal  software  interfaces  required,   and  the  use  cf  the 
E/D  as  a  distributed  catabase. 
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E.   CIJ1C1ITIIS  01  THESIS 

lie  SEIICE  troject  at  the  Naval  Postgraduate  School 
1NPS)  takes  the  approach  of  designing  the  logical  or  virtual 
local  Area  Network  (IAN)  first,  specifying  all  the  iuec- 
tional  modules,  theii  characteristics  and  the  commuEication 
protocols,  iather  thar.  focusing  on  the  hardware  characteris- 
tics of  IAN  first  £Bef.  1]  developing  alternatives  fcr 
SPLICI  Iccal  Area  Networks.  After  providing  a  fuEctioEal 
speciiicat icn  fcr  a  distributed  operating  system,  user 
interface  specifications  are  provided,  where  the 
dicticEary/ directory  system  (DCS)  constitutes  a  major  compo- 
nent [Bef.  4]  and  its  function  is  to  provide  support  fcr 
ramiEc  and  identifying  objects  in  SPLICE. 

lie  objectives  of  this  thesis  are  to  investigate  the 
area  of  data  dictionary/directory  systems  (DDS) ,  to  outline 
the  advantages/ disadvantages  of  these  systems,  arc  to 
presert  the  underlyirg  ideas.  Also,  to  pay  special  atteEtion 
to  tie  distributed  envircEment,  and  to  introduce  the 
benefits  fcr  the  SPI1CE  system  from  using  a  dicticflary/ 
director}  system.  Firally  an  attempt  will  be  made  to  iEtrc- 
duce  tie  interface  requirements  between  -  a  data 
dictionary/ directory  system  fcr  the  SPLICE,  and  the  neigh- 
icrirc  modules. 


15 


II-  JIC1ICNABV.CIBECT0BY  SYSTEMS 


A.   GIMEIAI  BEVIEW 

A  cata  dictionary  is  a  description  cf  data  resources.  It 
contains  both  machire-readable  and  human-readable  descrip- 
tions of  the  database  tables,  their  attributes,  interrela- 
tionships, and  semantics.  It  is  usually  not  very  large,  tut 
it  has  a  very  rich  structure.  Most  systems  have  a  data 
dicticnary  facility  which  stores  metadata  about  the  database 
aside  frcm  the  datalase  itself-  The  data  dictionary  is 
cften  tuilt  en  tcp  of  the  DEMS  as  a  special  application  with 
a  special  cata  definition  language. 

Thus  a  EDS  is  a  set  of  one  or  more  databases  containing 
data  about  an  organi2ation' s  information  resources.  These 
resources  can  be  retrieved  and  analyzed  using  standard  data- 
lase nanagenent  system  (DBMS)  capabilities.  The  concept  of 
a  data  dictionary  system  has  existed  in  the  data  ficcessing 
industry  for  a  number  of  years.  Use  of  such  a  system 
consists,  basically,  cf  an  attempt  to  capture  and  store  in  a 
central  location  definitions  cf  data  and  other  entries  cf 
interest  £Bef-  6].   The  principles  of  such  a  system  are: 

-Ircvide  for  better  data  control 

-Ercvide  for  better  documentation 

-Improve  the  quality  of  the  systems  that  are  tuilt  in 
terms  of  user  functionality  and  satisfaction  and  system 
naintairability. 

The  cata  dictionaiy  helps  to  capture  and  document  data 
elemerts,   their   definitions  and  some  of   their  descriptive 
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attributes.  It  alsc  provides  for  logical— grouping  cf  data 
elements  diring  the  process  cf  gathering  reguiremen t&  to 
build  a  rew  system,  lie  data  element  dictionary  provides  the 
vocabulary  that  can  te  used  between  the  systems  analyst  and 
the  end-user  [fief-  6^- 

Next  in  the  spectrum  of  usage  the  DDS  help  is  twofold. 
Ii£st  if  the  data  dictionary  is  available  it  can  be  extended 
to  include  inforaaticr  of  hew  and  by  whom  the  data  elements 
can  be  used.  Ihus  a  dictionary  can  be  used  to  store  the 
defiritiens  of  data  elements  and  the  definitions  cf  ctler 
data  constructs  (records,  files),  the  definitions  of 
processes  (programs  cr  manual  processes),  and  definiticrs  of 
data  users  (individuals,  organizations) .  The  Second  trend 
that  contributed  to  this  extended  usage  of  a  dictionary 
system  was  the  gradual  migration  away  from  the  use  cf  tradi- 
tional files  toward  the  concept  of  a  central,  integrated 
database  distributed  across  tie  DDN  but  centralized  witiin 
each  IAN,  under  the  control  of  a  database  management  system. 

lie  problem  cf  duplication  cf  data  (data  redundancy)  can 
be  sclved  inside  each  IAN  tut  another  mechanism  irust  be 
provided  in  order  to  solve  that  problem  across  tie  DEN. 
Ihis  pictlem  must  be  examined  carefully  and  that  irechanism 
must  provide  for  economy  because  sometimes  data  redundancy 
may  be  mere  cost-efficient  than  the  freguent  use  of  IEN. 

lie  above  is  vital  for  system  design  because  in  the 
SPLICE  environment,  data  are  to  be  shared  not  cnlj  by 
different  systems,  but  alsc  by  a  wide  range  cf  users.  lie 
basic  concept  of  a  EEMS  is  tc  provide  a  centrally  located 
set  cf  definitions  cf  data  within  each  LAN  that  is  to  be 
shared  in  crder  to  assure  that  different  users  will  access 
commcr  data  with  a  set  of  consistent  definitions. 

lie  LDS  acts  as  a  repository  of  all  definitive  informa- 
tion atott  the  database  suci  as  characteristics,  relation- 
ships,  and  access  authorizations.     These  databases,    as 
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implied  iy  the  term  'logically  '  j^-carrr  be-physically  stc red  in 
diverse  locations  within  each  LAN  hut  are  logically  linked 
via  ccmiiunica tions  and  the  EDS. 

lie  data  dictionary  system  located  in  a  node  withir  each 
IAN  can  he  used  to  provide  the  above  definitions  and  thus 
the  required  data  corsistency. 

Separating  the  data  dictionary  from  the  database  raises 
two  prctleis  £  Ref.  7]- 

-3he  dictionary  and  data  base  may  disagree  uith  cne 
another  tr.less  ore  interface  has  control  of  both  functions 

-Having  a  separate  data  dictionary  implies  having  a 
separate  language  for  the  definition  and  manipulation  cf  the 
dicticnary  catabase. 

Csers  vho  define  tables  and  other  objects  (cast  of 
systen-R)  are  encouraged  to  include  English  text  to  describe 
the  neanings  of  the  cljects.  later  other  users  can  retrieve 
attribute  tables  with  certain  attributes  or  can  browse  amcng 
the  descriptions  of  defined  tables,  if  they  are  so  author- 
ized, k  user  later  can  modify  these  entries  zo  change  the 
attributes  cf  an  object. 

E.   MAKAGEMENT  01    IHICBHATICN  EESOURCES 

Ixfcraation  resourse  managenent  (IBM)  is  a  methodology 
that  attempts  to  solve  a  set  of  problems  related  tc  the 
systen  life  cycle  ir  an  integrated  and  coordinated  manner. 
Ihe  data  dictiorary  system  will  play  an  important  rcle  in 
this  area. 

In  the  case  of  SIIICE  the  EDS  can  play  an  important  rcle 
in  providing  a  dccumented  inventory  of  information 
resources,  a  ccrtrcl  mechanism  for  the  analysis  ard  design 
cf  new  information  resources  and  the  necessary  resource 
independence. 
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A  data  dictionary  can  be  used  as  a  powerful  tool  (net  as 
a  scliticn)  that  can  aid  in  the  solution  to  various  jrctlems 
such  as  the  inventory  control,  report  production,  proper 
routing  cf  data,  proper  routing  of  requests,  data  consis- 
tency, security,  etc. 

finally  the  dictionary  system  project  is  in  fact  an 
Infornaticn  Resourse  Management  (IBM)  l  project.  Ihe  SfllCZ 
system  ^cssesses  much  valuable  data  that  has  teen  generated, 
collected,  and  stored  in  ai  automatic  and  'formated'  state. 
Ctili2aticn  of  any  class  of  data  involves  one  or  mere 
processes.   These  are  £Bef„  6] 

-  Collection :  It  is  a  prccess  that  tends  to  he  expen- 
sive as  the  cost  of  identification  and  recording  {including 
input  to  an  automated  systen,  as  necessary)  can  be  high. 

"  i recessing :  Tie  data  collected  is  generally  'managed' 
in  scire  fashion  before  and/or  after  being  stored.  Ir.  the 
case  of  automated  data,  this  occurs  through  the  use  of 
computer  programs. 

Storage:  The  repository  of  data  and  information 
termed  a  "data  base". 

fietrieval:  Using  the  knowledge  about  the  storage 
technique  being  used,  data  are  retrieved  to  answer  guestiens 
cr  tc  be  modified. 

~  Communications :  A  communication  line  is  needed  to 
connect  the  user  terminal  with  the  place  where  the 
dictionary  resides. 


lInfcrmation  fiescurse  Management  is  whatever  policy, 
action,  cr  procedure  concerning  information  (both  automated 
and  ncn-autcma ted)  fetich  management  establishes  that  serves 
the  overall  current  and  future  needs  cf  the  system.  Sucn 
policies,  etc.  wculd  include  considerations  of  availability, 
timeliness,  accuracy,  integrity,  privacy^  security,  audit- 
ability,  ownership,  use,  and  cost  effectiveness  [Eef.  6 j. 

1S 


lie  environment  ii  which  tbe  above  processes  take  place 
is  ccnpcsed  of  : 

"  £.§ta  .§££  inf oxgation.  Bepresents  the  core  of  the 
entire  irfcimaticn  processing  spectrum. 

~  lili  users  in  tie  system.  It  is  the  personnel  involved 
kith  the  system.  These  are  users  of  data  and  other  irfcriia- 
tion  components. 

i^isical  facilities.  Computer  hardware  and  ether 
physical  devices  used  in  data  processing. 

~  f recessing  facilities .  These  are  all  the  activities 
which  take  place  in  the  use  of  physical  facilities. 

Support  facilities.  All  the  services  which  are 
required  hy  users  of  cata  as  well  as  personnel  whose  respon- 
sibilities are  primarily  in  the  information  systems  area. 

Each  of  the  arove  components  is  refered  as  an 
Information  Resource  and  the  computer  systec  must  provide 
for  an  integrated  ard  coordinated  manner  to  manage  the 
entire  irfcrmaticn  resource  of  the  SPLICE  system  and  the 
data  dictionary  has  to  play  a  lajor  role  in  conjunction  with 
the  datarase  management  module. 

C.   £CEfC£l  OF  S7.STE1!  IIFE  CYCIE 

In  this  section,  we  present  some  highlights  of  how  the 
data  dictionary  supports  the  main  steps  of  system 
development . 

Ire  waterfall  model  of  the  software  life  cycle  £Bef.  14] 
consists  of  the  following  stages:  system  f easitility , 
requirements  specification,  product  design,  detail  design, 
coding^  integration,  implementation,  operations  and  mainte- 
nance. Cf  course  there  are  also  other  models  of  a  software 
life  cycle  hut  basically  the  functions  of  a  DDS  are  the  same 
in  whatever  model  we  consider- 
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During  the  systemfs  feasitility  stage  the  DDS  can  te 
used  for  nek  data  element  collection  and  to  avoid  redurdan- 
cies  and  inconsistencies.  Also  the  DDS  can  certain  a 
descri.fti.cn  cf  processes  that  are  already  available  ard  to 
help  in  assessing  the  true  magnitude  of  the  proposed  task. 
During  the  reguirements  specification  stage,  the  data 
dictionary  can  provide  the  means  to  detect  existirg  inaccu- 
racies ir  definitions  and  tc  correct  tnem  before  the  system 
cperaticr.  This  is  because  the  DDS  contains  the  overall 
scope  of  the  reguirenents  tc  be  specified. 

During  the  product  design  and  detail  design  stages,  the 
DDS  can  help  because  it  contains  the  design  details  cf  fceth 
data  and  processes,  which  can  be  shared  by  all  members  cf 
the  design  team.  Particularily  in  database  design  the  IDS 
can  record  multiple  user  views,  pass  output  from  the  logical 
design  phase  to  physical  design  phase,  generate  multiple 
designs  fcr  benchmark  testing,  and  verify  the  existing 
conversions  of  data  in  the  system.  Fcr  the  rest  cf  the 
stages  the  DDS  can  help  in  data  collection,  coding,  and 
testing,  by  providing  any  desired  degree  of  coordination  and 
contrcl  ever  tasks,  generating  data  structures,  storing 
instructions  for  the  staff,  describing  the  various  jobs  and 
activities,  and  finally,  providing  a  means  for  effective  and 
consistent  modificaticn  of  the  system. 

Additional  benefits  that  can  be  derived  from  the  IDS 
£Bef.  6]  are  naming  standards,  aid  to  auditing,  interfaces 
tc  application  program  development  tools,  and  software 
conf i guraticn  management.  A  DDS  allows  a  system  tc  be 
extended  trough  the  addition  of  new  entity  types,  relation- 
ship types,  attribute  types,  and  also  can  te  used  tc  add 
coniiguraticn  entity  types  such  as  reguirements  specifica- 
tions, change  notices,  etc.  The  major  advantage  frcn  the 
use  of  the  DDS  is  in  the  case  of  an  active  system  where  the 
systeii  net  enly  records  the  entities,  but  also  controls  how 
they  are  revised. 
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£.   EJ1A  DICTIONARY- SYSTEM  CBGANIZATION 

lie  organizational  structure  for  a  DDS  that  is  to  be 
adopted  nust  be  c om it €us urate  with  the  size  of  the  activity 
at  any  cue   time.    Such. a  structure  is   displayed  in  figure 
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Figure  2.1    SPLICI  Data  Admin.  Function  Organization,, 

2.  1  - 

lie  Data  Adainistrator  is  the  person  responsible  for 
articulating  the  data  policy  after  the  major  guidelines  have 
teen  laid  down  by  the  designing  team.  That  policy  includes 
planning  icr  data  collection,  its  structuring,  its  storage, 
and  its  guality  ccnticl.  For  tne  SPLICE  system  the  Data 
Administrator  can  be  a  person  or  a  team  located  ir  any 
place,  whose  main  function  will  be  the  setting  of  the  above 
policy. 
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Ihe  Dictionary  Adminis  trator  who  the  jpers  ce  ^j^—  teajs 
respcnsitle  for  the  dictionary  system  within  the  Data 
Administrator  function  (eg.  recording  of  all  meta-  informa- 
tion aid  meta-data  ard  its  maintenance  through  the  use  of 
the  dictionary  system,  along  with  making  its  facilities 
availafle  tc  the  users  of  this  system).  Because  in  the 
SPLICE  system  the  data  dictionary  is  unigue  through  all  the 
systen  ard  no  diffeient  views  of  the  data  dictionary  are 
permitted  in  the  various  locations,  that  team  or  person  must 
le  unigue  through  tie  system.  Only  that  team  (or  person) 
must  have  the  priviledge  tc  naintain  the  DD.  The  Database 
Administrator  who  the  person  (or  team)  responsible  for  the 
technical  aspects  of  obtaining,  running  and  maintaining  the 
DBMS.  Since  SP1ICE  is  a  distributed  system  with  datalases 
distributed  across  6i  different  locations,  the  Database 
Administrator  does  ret  need  to  be  unigue.  The  reguired 
policy  and  definitions  are  setup  by  the  data  dictionary 
administrator  and  this  is  enough  to  maintain  consistency 
through  the  whole  system.  Ihe  Data  Quality  Inspection  team 
has  a  role  also  in  the  hierarchy,  and  its  function  is  the 
guality  inspection  of  the  information  or  data,  and  the 
guality  audit  trail  ci  the  whele  system.  Ihis  can  be  one  or 
more  teairs.  In  the  case  of  several  teams  the  entire  audit 
effort  can  te  divided  among  them. 

I.   CCMCIIIS  ON  DDS  SELECTION  AND  EVALUATION 

It  is  very  difficult  to  find  a  commercialy  available  DDS 
to  meet  exactly  the  reguirements  of  a  system  under  develop- 
ment. A  selection  and  evaluation  process  composed  of 
various  stejzs  must  be  developed  in  order  to  select  the  test 
syst  en. 

lour  steps  are  proposed  by  £Eef-  6]  for  the  process  of 
selection  and  evaluation  of  a  DDS: 
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-letermine  the  reguiremert £  tor  the  diction a*y  system. 
These  shculd  be  classified  as  either  being  mandatory  ci  net. 
If  net  maccatory  estatlish  a  scale  and  assign  numbers  indi- 
cating tie  importance. 

-levelec  a  list  cf  features  of  dictionary  systems  ttat 
will  te  used  in  the  evaluation  cf  systems. 

-Eetermine  a  mapping  from  the  needs  onto  these  features. 

-lor  each  mapping,  using  descriptions  of  availatle 
systems,  a  system  can  be  found  either  to  qualify  or  net. 
This  piccess  leads  to  eliminate  systems  that  are  net 
gualif y . 

lie  cannct  say  that  the  above  procedure  is  perfect  and 
does  ret  have  a  risk  for  mistakes,  because  it  is  sutjective 
and  variously  defends  on  the  experience  and  smartness  cf  the 
selection/evaluation  team.  Scire  more  common/general  reasens 
leading  to  mistakes  are:  The  needs  were  never  ^rcjerly 
assessed,  and  potential  users  were  not  asked  the  right  Ques- 
tions, unnecessary  but  apparently  "nice"  features  were  given 
high  values,  the  evaluation  cf  the  system  was  inconsistent 
because  different  pectle  evaluate  different  systems  without 
a  well-defined  measurement  method,  undue  emphasis  was  flaced 
en  features  that  will  be  needed  in  the  future  but  uninper- 
tant  now,  etc. 

fcr  the  SPLICE  system  we  cannot  follow  the  aheve  proce- 
dure- SIIICE  has  decided  to  use  Tandem  as  their  "front  end" 
minicempi; ter.  is  a  result,  selecting  a  DDS  is  largely  a 
foregone  conclusion  ir  this  situation.  So  we  have  to  use 
Tandec  ZEUS   and  the  associated  dictionary  capabilities. 

Z.   AEUITICNAL  ASPECTS  OF  DIS 

In  tie  next  few  jears,  several  extensions  to  dictionary 
systens,  net  availatle  today,  will  most  likely  be  commer- 
cially available.    These  additions   will  allow  dictionaries 
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to  he  mere  effective  in  interfacing  with  the  icier naticn 
resources.  The  use  cf  extensibility  facilities  allows  an 
installation  to  custcnize  the  dictionary  system  in  crder  to 
make  it  effective  in  such  ap plications.  Such  examples  are 
the  use  cf  CDS  tc  control  the  total  inf ormaticn  resource,  to 
aid  in  tie  analysis,  design  and  development  cf  information 
systems,  ard  to  aid  in  efficient  database  design.  Ihe  last 
applicatior  example  is  the  use  of  DDS  as  a  repository  of 
information  for  an  entire  system.  This  is  exactly  the  aajor 
role  the  ELS  has  to  play  in  the  SPLICE  system. 

Eeferring  to  the  £P1ICE  application  environment  the  IDS 
would  xeguire  users  and  analysts  to  define  the  system  data 
elements,  files,  etc.  which  would  entail  updating  eld  defi- 
nitions, discarding  outdated  ones,  and  introducing  rew  ones. 
In  this  way  standards  cf  data  definition  and  description  for 
application  programs  can  te  established  over  the  entire 
SPLICE  system  £Eef.  (I],  But  on  the  other  hand  it  is  a 
Herculean  task  tc  retrofit  a  dictionary  to  existing  applica- 
tion systems.  Eecause  of  the  many  above  mentioned  difficu- 
lies  in  i nplementinc  the  dictionary  to  old  application 
systens,  we  recommend  as  much  irere  preferable  to  inplement  a 
dictionary  for  new  applications  only.  That  means  that  the 
dictionary  will  te  developed  gradualy  and  a  long  period  will 
±e  needed  to  be  fully  implemented  for  the  whole  SPLICE 
syst en. 

Although  DDSs  have  many  advantages,  their  disadvantages 
should  te  mentioned  as  well.  Eictionary  systems  are  complex 
software  systems  and  the  execution  of  many  dictionary  func- 
tions may  consume  a  significant  part  of  the  system 
resources.  As  the  scope  cf  the  dictionary  is  enlarged  to 
include  always  larger  number  cf  information  resources,  the 
EDS  will  tegin  gradually  tc  look  like  the  major  resource 
consumer,  and  thus  the  main  user  of  the  host  computer  system 
£Bef.  6].    When  we   consider  active  interfaces  of   the  DE5, 
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the  previous  problem  becomes  more  serious.  If  the  IDS 
controls  a  process  through  cce  cf  these  active  interfaces/ 
it  fellows  that  this  process  cannot  proceed  until  such  tine 
as  the  dictionary  system  has  finished  its  job.  This  delay 
time  is  added  to  the  thole  process  time.  Given  that  there 
can  te  many  processes,  the  continuous  use  of  the  DDS  and  tne 
accumulated  service  time  may  eventually  result  in  a 
rottleneck. 

lie  crocosed  solution  for  the  SPLICE  system  [Bef-  4^  can 
avoid  (or  at  least  reduce)  this  overhead  by  locating  one 
copy  cf  the  DDS  in  each  LAN.  Rith  this  simple  and  efficient 
technique  each  user  located  in  any  of  the  62  stock  and 
inverter^  control  pcints  only  needs  to  consult  the  local 
IDS.  The  cumber  of  tsers  who  Deeds  the  DDS  services  remains 
the  sane  hut  the  overhead  from  the  long  queuing  time  across 
the  IDN  will  be  redused  ly  a  factor  close  to  62.  3y 
locating  the  master  copy  cf  the  DDS  in  one  place  we  can 
solve  the  maintenance  problem  cf  the  DDS/  because  additions/ 
deletions  ard  updates  of  the  IDS  can  he  done  only  via  the 
master  copy  by  the  Dictionary  Administrator.  All  the  ether 
copies  can  be  updated  only  remotely  by  the  master  copy 
through  the  DDN,  ir  such  a  way  as  to  represent  the  exact 
image  cf  the  master  copy.  Eecause  cnanges  in  def initiens 
{deleticrs,  updates,  additions)  are  not  frequent,  we  esti- 
mate that  the  whole  process  of  updating  the  local  copies  of 
the  CIS  will  not  be  expensive,  and  the  resultant  overhead 
will  net  te  significant.  Cf  course  this  assumes  all  62 
IAN • s  are  working  off  the  same  schema,  and  the  application 
environment  is  homogeneous  acrcss  the  network. 

€.   ElIfiifiCBY  OP  DDS 

A  good  hierarchical   DDS  structure  is  significant   if  we 
want  tc  avoid  the  "bottleneck"  mentioned  above.   A  structure 
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is  proposed  in  Figure  2.2  and  we  believe  that  it  is  less 
expensive  in  consuming  the  system  resources  than  the  struc- 
ture ci  having   different  views  of  the   master  dictionary  at 
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figure  2.2   A  first  IES  Hierarchical  Structure  for  SfllCE. 

each  IAN.  In  particular  suppose  the  copies  of  the  local 
dicticnaries  are  not  exact  images  of  the  master  dictionary, 
lut  are  different  views  of  the  master,  especially  views 
containing  informaticr  only  for  the  local  database.  Id  such 
a  case  it  is  not  useful  to  separate  the  definitions  free  the 
actual   catara.se  since   the   different   views  of   the   whcle 
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database  ai€  centrali2ed  withir  each  LAN.  If  a  spare  part 
for  eiacple  cannct  be  found  in  a  local  database,  thee  the 
user  has  tc  consult  the  master  dictionary  to  find  the  loca- 
tion cf  the  reguested  spare  part  because  the  local  ccpy  of 
the  data  dictionary  dees  not  ccntain  information  atcut  ether 
data  hases  of  the  system.  In  this  case  the  user  has  to 
access  the  EDN  twice,  first  tc  consult  the  master  dictionary 
and  then  tc  consult  the  local  database  in  which  the  spare 
part  is  located.  Ihis  procedure  can  easily  lead  tc  long 
waiting  tines  and  finally  tc  "bottlenecx"  because  the  naster 
dictionary  will  have  to  answer  in  guestions  coming  from  62 
different  liS's.  A  second  hierarchical  structure  is  shewn 
in  ficure  2.3  .  This  structure  involves  the  location  of  a 
copy  cf  EES  in  selected  nodes  instead  of  each  node.  cv  this 
way  we  neduse  the  amcunt  of  secondary  memory  needed  tc  stcre 
the  E/J3  hut  we  increase  the  use  of  DDN.  This  increase  in 
use  cf  LLN  is  inversely  proportional  to  the  numher  cf  I/D 
replicated  copies.  The  soluticn  cf  locating  exact  ccpies  cf 
the  naster  dictionary  in  each  or  selected  LAN's  has  the 
disadvantage  of  consuming  more  secondary  storage  hut  cur 
estimation  is  that  this  is  preferable  and  less  expensive 
than  the  freguent  use  of  DDK  in  order  to  consult  the  master 
copy. 

Ke  cannot  say  that  distritution  instead  of  replication 
cf  CIS  is  an  inefficient  methed  not  acceptable  for  SPIICE. 
Since  there  is  not  enough  experience  for  distributed 
systens,  and  especially  for  data  dictionaries,  we  have  to 
examine  carefully  every  possible  architecture,  t  he  prcs  and 
the  cons  cf  each  one,  in  crder  to  make  the  best  decision. 
But  still  we  believe  that  the  decision  will  be  based  mere  on 
estimations  comming  from  intuition  and  less  in  experience 
and  statistical  information.  Such  an  architecture  is  based 
en  distribution  instead  of  replication  of  D/D  for  SEIICE. 
This  is  shewn  in  figure  2.4,  and  will  be  examined  in  a  next 
chapter. 
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figure  2-4   A  Third  IDS  Hierarchical  Structure  for  SPIICE. 

B.   IIA1CEE  ANALYSIS  CF  DDS 

In  this  section  the  features  of  DDS  and  a  more  detailed 
analysis  of  them  will  be  presented.  This  presentation  is  a 
theoretical   approach  and   dees  not   concern  any   particular 
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systeii.  A  cost/benefit  analysis  can  tell  us  which  features 
need  to  be  included  in  a  DDS  under  development.  It  is  mere 
preferable  approach  than  to  develop  a  DDS  as  described  telow 
using    the    landem   DBMS  capability. 

1  •      Architecture    and   Implementation 

Ihe  relaticrship  between  DDS  and  DBS S  will  be 
addressed  here.  The  purpose  cf  a  DBMS  is  tc  manage-  data  and 
the  purpose  of  DDS  is  to  manage  meta-data.2  Ihe  guesticn  is 
whether  the  DDS  must  he  a  free-standing3  or  DBMS -dependent* 
systei   I Bef .   6 ]. 

Ihe  free-starding  approach  is  good  for  ccaaercial 
systeis  because  each  enterprise  can  evaluate  the  prcs  and 
cons  and  reach  the  optimal  decisions  whether  to  buy  cr  net. 
Ihis  approach  raises  compatibility  problems  tetween  the  EDS 
and  the  £EMS,  especially  when  the  vendors  are  different 
companies.  There  are  many  factors  we  have  to  taxe  into 
account  fcten  deciding  whether  a  DDS  must  be  free-starding  cr 
EBMS-dependent .  These  factcrs  include  the  method  of  imple- 
aentaticr,  the  scope  cf  usage,  whether  the  DDS  and  IBMS  are 
going  to  be  developed  together  or  not,  and  whether  the}  are 
gcing    tc   be    supplied   by    the    sane    vendor   or   not. 

Cne  other  feature  of  DDS  architectural  structure  is 
whether  the  DDS  should  be  passive  or  active.  Suppose  there 
is  a  ccupiler,  application  prcgram,  cr  ether  process  that 
requires    meta-data    fcr   its    execution.  There  should    be    DDS 

available  which  produces  a utcnatically  the  required  meta- 
data. Ihis  f uncticnality  is  referred  to  as  dicticnary 
interface    ard   can   operate  in    two    modes:      Passive    where    there 


2fieta-data    is    the    data    that    describes    data 

3ii    dictionary      system   which    does   not      use   a  DBMS      ir    its 
implenentation 

*£    dictionary    system    which    dees   use    a    DBMS   in    its    inple- 
nentaticr 
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exists  an  cption  cf  whether  the  process  will  retrieve  the 
required  neta-data  (through  the  dictionary  interface  cr  ficm 
elsewhere)  or,  in  the  case  where  the  process  already 
contains  tie  meta-data,  there  exists  an  option  for  the 
systen  tc  cleck  whether  this  neta-data  is  the  most  current 
versicn  in  the  dictionary.  Here  the  dictionary  is  net  in 
the  critical  path  of  a  process.  Active  where  the  arcve 
cpticrs  dc  not  exist  and  the  process  always  uses  the  most 
current  neta-data  in  the  dictionary.  The  dictionary  here  is 
in  the  critical  path  cf  the  prccess  and  the  process  must  go 
through  the  dictionary  fcr  the  neta-data  in  order  to  execute 
properly . 

A  LIS  can  contain  both  kinds  of  interfaces.  We  have 
to  keep  in  nind  that  the  interfaces  of  the  DDS  systen  dc  rot 
only  concern  the  DDS  itself,  hut  also  other  modules  with 
which  the  dictionary  has  tc  cooperate  in  order  tc  nairtain 
the  whele  sjstem. 

2-   logical  Schena,  Entity  Types,  Relationships 

Eictionary  schema  is  the  term  denoting  the  logical 
structure  of  a  dictionary.  Structural  characteristics  and 
contents  cf  the  dictionary  schema  determine  the  kinds  of 
Eeta-cata  and  the  relationships  to  he  established  among 
them.  Using  the  entity-relationship-attribute  model 
£Bef.  6]  fcr  the  dictionary,  we  define  entities  as  real 
world  effects  or  thirgs  about  which  information  exists  in 
the  dictionary,  at  tribute  s  as  properties  (quantities  or 
gualities)  cf  the  entities,  and  relationships  as  connections 
letween  entities. 

In  the  DDS,  resources  such  as  data,  hardware,  soft- 
ware, transactiens,  personnel  and  documents  may  he  repre- 
sented, and  entities,  attributes,  and  r elatienships 
associated  with  these  resources  must  also  be  represented, 
lables  1   through  V  at   the  end   of  this  chapter   taken  ficm 
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£Eef-  4]  indicate  possible  data  element  attributes/  tile 
entity  attributes,  hardware  entities  and  attributes, software 
entities  and  attributes,  and  dccumen t/report  attributes  for 
the    SI1ICI    system. 

Similar  entities  in  a  DDS  establish  entity  types. 
Attributes  can  also  have  a  degree  of  similarity  and  in  this 
case      we      speak      about   attribute      types.  Finally      similar 

considerations  apply  to  relationships  and  so  we  have  rela- 
tionship  types,    that   are  relationships    between- en tity    types. 

Schema        descriptor :  In      a        dictionary        schema 

containing  all  existing  entity-types,  relationship- types, 
and  attribute-types,  ary  one  of  tnem  can  be  referred  tc  as  a 
schema  descriptor.  Information  existing  in  the  schema  can 
indicate  which  entity-types  are  members  of  a  given 
relaticnship-type,  ard  whica  attribute- types  are  associated 
with    an    entity-type    cr   relaticnship-type. 

Intity-types  of  a  DDS  can  be  classified  as  data 
entity-types,  process  entity-types  and  usage  entity-types. 
Cn  the  ether  hand  attribute  types  can  be  descriptions,  clas- 
sification and  audit  attributes  created  by  the  dicticnary  to 
indicate  identification  of  the  person  who  created  the 
entity,  date  of  entity  creation,  identification  of  the 
person  who  last  modified  the  entity,  date  of  latest  modifi- 
cation, and  total  number  of  modifications  of  the  entity 
£Eef.  6].  Ihese  capabilities  are  very  useful  for  a  system, 
especially  cne  as  conplex  as  SPLICE.  Using  the  above  capa- 
bilities reports  and  summaries  can  be  presented  on  reguest, 
and  also  we  can  have  a  trace  of  various  interactions  cn  the 
systen    using  application   programs    for    this  reason. 

3  .      Interfaces    and   Ccmm ands 

Interfaces  must  be  included  in  a  DDS  in  crder  to 
allow  the  user  to  communicate  with  the  DDS  via  a  terminal. 
The      terninal-DDS      ccmmunicaticn      in    the      SPLICE      system      is 
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carried  cut  through  tie  Session  Services  module.  This  is  a 
separate  tcpic  which  will  te  examined  separately.  In 
general  an  interface  can  be  as  shown  in  Table  VI. 

Cn  the  other  land  cemmands  can  he  classified,  en  the 
lasis  ex  their  functionality,  into  various  categories  as 
shown  in  Talle  VII. 

A  dictionary  system  can  be  regarded  as  a  software 
product  that  helps  ir  storing  information  about  data  that 
already  exists  in  databases-  Both  DDS  and  DBMS  deal  with 
descriptions  and  characteristics  of  data  elements  and  with 
the  logical  structures  obtained  from  these  elements  and 
their  relationships.  A  closely  integrated  dictionary  system 
and  autcnated  database  design  process  have  much  tc  cifer. 
The  interfaces  between  a  dictionary  and  a  database  design 
process  can  be  divided  into  two  broad  categories: 
-Initial  data  entry  and  editing 
-logical  model  structuring 

Initial  data  entry  and  editing:  For  data  entr  y  the 
data  reguirements  information  needed  by  automated  database 
design  procedures  is  almost  a  complete  (proper)  subset  of 
the  irfcrmation  normally  stored  in  current  commercial 
dicticrary  systems.  For  the  SPLICE  the  files  already  exist 
but  tie  dictionary  dees  not.  Therefore  the  whole  design  of 
IDS  must  provide  for  initial  detection  and  avoidance  of 
duplicate  entries.  As  soon  as  the  design  takes  care  of  that 
during  tie  initial  steps,  then  the  entry  of  information 
about  raw  data  elements  has  to  be  made  only  tc  the 
dicticrary  system.  Kext  an  interface  must  exist  in  order  to 
allow  the  design  procedures  tc  access  information  in  named 
aggregations  (local  views).  For  editing,  the  initial  data 
entry  is  rarely  clean  in  the  sense  that  names,  usage,  and 
characteristics  of  tie  data  elements  may  not  ya t  le  stan- 
dardized across  local  views.  Synonyms,  homonyms  and  inccn- 
sistert  characteristics  of  the  same  data  usually  result  wien 
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data  regiirements  are  gathered  from  different  sources.  Ihe 
editing  phases  of  the  automated  design  procedures,  and  the 
reports  produced  therein,  cari  serve  as  an  input  filtering 
function  fcr  the  dictionary.  When  the  interactive  editing 
phases  aie  completed,  obsolete  information  (eg.  non-standard 
names)  can  he  removed  from  the  dictionary,  such  tnat  the 
information  remaining  permanently  is  clean  and  consistent. 
Again,  as  *>e  mentioned  in  a  previous  section,  this  can  be 
done  cnly  for  new  applications  because  the  tasc  of  retro- 
fitinc  a  dictionary  tc  existing  application  systems  is  very 
difficult. 

logical  model  structuring:  The  structuring  proce- 
dure fcr  initial  design  should  he  able  to  extract  filtered, 
unstructured  data  element  information  in  named  aggregates 
(local  viefcs)  from  the  dictionary  such  that  the  ccmccsite 
model  and  the  derived  logical  designs  can  he  generated  in 
the  ncrnal  manner. 

lor  adding  new  requirements  to  existing  desigts  and 
when  processing  new  functions  or  adding  new  data  tc  an 
existing  database,  the  design  process  should  be  able  to 
extract  from  the  dictionary  a  description  of  the  existing 
design  along  with  the  filtered  unstructured  data  element 
information  for  that  which  is  new.  Various  levels  of 
constraints  on  the  freedom  of  structuring  processes  can  be 
set  here  in  order  tc  facilitate  the  whole  design  effort. 

Cnce  the  automated  design  process  is  completed  and  a 
suitable  logical  design  has  been  obtained,  the  results  must 
be  stored  in  the  dictionary.  Assuming  the  unstructured  data 
elemerts  are  already  described  in  the  dictionary,  the  rela- 
tionships defining  segments,  databases,  logical  relations 
and  secondary  indexes  would  new  be  stored. 
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TABU    I 
Data   Element   Attributes 


lype 

Eange 

length 

Unit   of   measure 

Usage 

language   naaes 

Ee^etitions 

88    levels 

Key 

Default  value 

Display  fornat 
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TAEIf  II 
File  Entity  Attributes 

file  name 

locations 

Size  (in  bytes) 

format  (seq,  randon,  tin) 

Access  control 

Access  security  prctection 
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TAEII  III 
Selected  Hardware  Entities  and  Attributes 

Entities 

Processing  system 

Secondary  stcrage 

Cccmunicaticns  system 

Ccrcen trators 

lerminals 

1AK  I/O  peripherals 

Attributes 

lyre 

Model 

Model  number 

Serial  cumber 

Hf ger ' s  number 

Source 

Features 

Description 

Eccu-  references 

Osage  by   site 

Cost 

Maintenance  activity 
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TABU  IV 
Selected  Software  Entities  and  Attriiutes 

Erti_ti.es 

Cjc  era  ting    system 

Operational    support   system 

Ervir on mental    system 

Application    software 

Attributes 

Ercgram-id 

Eevisioc  numler 

Bevisior  date 

Bate  compiled 

Type  of  compiler 

Patch  level 

Change  level 

license 

Date  released 

Ercduct  numler 

Source 

features 

Dccumentaticr 

Usage 

Cost 

Maintenance  activity 
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TABU    V 
Docuient/Beport    Attributes 


bane 

Nuuier 

frcduct  numlcr 

Release  date 

Eevisicn  nunrer 

Scurce 

Feature 

Eescription 

Quantity 

Cost 


TABU  VI 
Kinds  of  EDS  Interfaces 

Command  language 

Screen  crierted  interface 

Fixed  format  batch  data  entry  facility 

£ rogramnatic  interface  that  allows  user  writtec 
applications  programs  to  access  the  dictionary 
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TAEIE    VII 
Comnand  Categories   for   DDS 

Dictionary    naintenance 

Eetort    and    cuery 

Data   structure    interface 

Extensirilitj 

Status    related 

Security 

Dictionary    j:iccessing    ccntrol 

Cictionary    administrator 
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III.     1MEGBATI0N 

A.       Ill    EECELEfl 

An  active5  data  dictionary  is  desirable  for  the  SELICE 
systeii.  It  is  alsc  known  [Bef.  8]  that  most  dictionaries 
iaii  tc  ireet  this  objective.  A  prerequisite  t c  an  active 
dictionary  is  a  high  degree  ox  interaction  between  the 
dictionary  and  various  other  software  elements  such  as  tne 
EBflS  itself,  but  also  including  guery  languages,  report 
generators,  applicaticn  development  aids,  and  the  like.  An 
architecture  for  a  centered  and  highly  integrated  2ZS  taken 
from    [Bef.    6]  is   shc*n   in   Figure    3.1    . 

lie  existing  dictionaries  today  are  noticeably  urinte- 
grated,  and  hence  less  than  active.  Such  a  situaticn  is 
shown  in  Eigure  3.2  (taken  frcm  £Bef.  8]  )  concerning  the 
IB a  BE/EC  data  dicticnary  and  related  software.  Notice,  in 
particular,  that  whereas  scire  batch  feeding  of  data  is 
provided  tc  and/cr  ficm  the  dictionary,  there  are  nc  fevier 
than  six  jlaces  vhere  database  definition  data  is  stored  (in 
additicn  tc  data  definitions  included  in  actual  programs) 
£Ref.    £].      Ihese  are    : 

lie    IE/EC   dictionary   itself 

lie    EEE/PSB    lifcraiies 

Ihe   CCECI   co^y    lilrary 

lie    catatase    design    aid     (EEDA) 

Tie    GIS    data   definition    talles 

The    application    development    facility     (ADF) ,    segment 
rules   in    an    I££/DC  environment,    or    in 


5Active    to      some    degree    because    if      it    is   toa       active    we 
can    lccse    efficiency 


development    maiagement    system    (DMS)    files    in 

a   CICS   envir  cment. 
lhere    is      no   guarantee    that      each    of      these  descriptions 
will    agree    at      any   point   in    time.  Other    data   dictionaries 

may    hav€    a    tigher    degree   of      integration   but   no   one    is    close 


Oata  base 


DBMS 


Inquiry 


Beport  generator 


Oata  definition 
generator 


DATA 
DICTIONARY 


Application 
generator 


Application 
program 


Metadata  base 


Data  base 
design  aid 


Figure  2.1    Highly  Integrated  D/D  Centered  Architecture. 

to  the  degree  of  integration  suggested  in  Figure  3.1  .  A 
high  level  of  integration  is  very  much  needed  in  crder  to 
support  the  advanced  iunctiens  of  an  active  dicticrary.  lo 
see  that  tetter,  consider  a  user  who  wants  to  know  what  data 
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is  ir  the  database,  cr  a  DEL  routine  which  wants  tc  edit  a 
field  filer  to  updating  the  database,  or  the  database  access 
systen  which  needs  tc  know  if  a  user  password  is  valid  for 
updating  a  certain  record.  All  tne  above  functions  recuire 
direct  access  to  the  data  dicticnary. 

Ihe  extent  to  hiich  a  LIS  qualifies  as  being  "inte- 
grated" is  a  relative  notior  determined  by  the  sccpe  of  its 
metadata  and  the  way  that  it  interfaces  with  ether  scftware. 
Ihe  mest  ccnmon  use  cf  the  term  "integrated"  is  with  refer- 
ence tc  a  I/D  that  is  the  sole  source  of  metadata  in  the 
systen.  lie  integrated  D/I  is  accessed  for  all  references 
to  meta  data.  Most  cf  the  cemmerciaily  available  EES  have 
reached  a  high  degree  of  integration  with  their  envircn- 
ments,  and  this  results  in  multiple  sources  of  descriptcrs 
within  tie  systems.  Ihe  DDS  permits  these  systems  tc  access 
the  £/£  indirectly  and  convert  the  metadata  of  each  system 
to  the  fcrmat  reguired  by  the  D/D  £Bef.  5].  So  for  example 
a  DDS  might  communicate  with  a  compiler  in  either  cf  two 
vays ; 

-Ey  generating  file  and  record  definitions 
that  the   compiler  accepts  via  copy  statements. 

-Ey  reading  source  programs  and  creating 
transactions  to  load  the  DDS  with  descriptions 
cf  files,  records,  and  elements. 

Cne  additional  area  which  demands  in vestigaticn  fcr  the 
development  of  a  succesful  LDS  concerns  integrating  sctecas 
which  describe  the  logical  structures  of  all  data  types 
existing  in  a  distributed  {like  the  SPLICE)  database.  This 
feature  permits  the  determination  of  a  data  file's  legical 
structure  as  well  as  its  identity  and  location,  and  could 
possibly  be  essential  to  the  development  of  query  and  data 
model  translation  shenes.  The  existence  of  a  master  schema 
also  permits  the  legical  relation  of  data  across  file 
boundaries;  then  all  files  in  the  network  can  be  considered 
as  areas  fcithin  a  sincle  large  database  [Eef.  9J- 
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Figure   3.2        lift   Data   flaragement   Architecture, 
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E.   IBTEGBAIION  CF  CIS 

llree  aspects  of  integrated  DDS  in  the  centralized  and 
distrituted  datatase  envir cnment  for  SPLICE6  are  cf  great 
interest  and  must  be  emphasized  £Eef.  5]. 

-lhe  software  interfaces 

-3h€  convert  functions 

-Ite  environmental  dependency  between  the  DDS  and  the 
IBMS 

A  LIS  is  irtegrated  with  other  software  packages  by 
facilities  that: 

-Allcw  direct  and  indirect  access  to  the  D/D 

-Au tcmatically  capture  the  metadata  used  ty  ctter 
systems 

In  the  next  three  subsections  we  will  examine  the  thiee 
most  interesting  aspects  of  an  integrated  DDS. 

1  •   Software  Interfaces 

A  software  interface  permits  another  system  to 
access  the  L/D  either  statically  or  dynamically.  first  we 
consider  the  static  interface,  which  links  the  D/E  with 
another  system  indirectly  via  the  extraction  of  a  file  of 
formatted  metadata.  lor  the  static  interface  of  a  DDS  and  a 
EBMS,  fcr  example,  the  data  dictionary  administrator, 
following  the  specifications  of  the  data  administrator, 
enters  into  the  CDS  all  pertinent  transactions  to  define  the 
database  and  the  database  administrator  using  the  above 
definitions  describes   the  database.     After  reviewing   the 


6Cur  approach  for  the  SPLICE  database  and  data 
dictionary  distribution  is  hytrid.  SPLICE  is  a  distrituted 
system,  nut  the  databases  are  centralized  within  each  LAN. 
Also  tie  dictionary  copies  at  each  of  the  selected  IAK*s  are 
exact  ccfies  cf  tie  master  dictionary  and  different 
dictionary  views  are  not  permitted.  So  the  whole  SPLICE 
system  can  be  viewed  as  a  distributed  system,  but  concerning 
each  particular  IAN,  the  database  and  data  dictionary  can  be 
said  tc  follow  the  centralized  database  environment  concept. 
So  both  ideas  of  certralized  and  distributed  environments 
can  te  applied  tc  the  SPLICE  with  slight  modif icaticns . 
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accuracy  of  this  database  description,  a  command  is  gener- 
ated icr  EES  that  uses  this  descripticn  to  produce  a  file 
containing  the  DDL.  The  EBMS's  DDL   processor   then  trans- 

lates this  generated  DDL  intc  a  schema  file  that  the  run 
time  unit  cf  the  DBES  can  access.  No  run-time  connection 
retween  the  DDS  and  the  EEBS  exists  here;  the  EEMS's 
processor  is  not  executing  during  the  DDS's  DDL-generation 
process. 

Static  interfaces  differ  somewhat,  depending  uj:cn 
whether  they  interface  the  EDS  with  user-written  prcgrans  cr 
with  vender-supplied  software  packages.  Static  interfaces 
for  piogiams  written  in  languages  such  as  COBOL  and  PI/I 
produce  file,  record,  and  datarase  descriptions  for  the  user 
trcgrams  frcm  the  data  dictionary  £Eef.  5].  These  inter- 
faces scnetimes  feature  edit  capabilities,  format  options, 
and  various  other  functions  to  make  the  interface  mcie  flex- 
ible. Edit  capabilities  may  include  being  able  tc  add 
prefixes  and  suffixes  and  even  to  replace  entire  rames. 
format  cptiens  may  ccntrcl  indentation,  level-numter  incre- 
nents,  seguence  numbers,  and  line  identifiers.  Inclusion  of 
various  clauses  suet  as  comments,  condition  names,  and 
initial  values  also  nay  he  allcwed. 

Static  interfaces  fcr  software  packages,  such  as  IDL 
processors,  communication  monitors,  and  guery  processors, 
produce  formatted  statements  for  those  packages  cr  create 
specially  encoded  control  files  for  their  use. 

Static  interfaces  are  prevalent  because  cf  tneir 
utility,  capability,  and  efficiency.  With  powerful  static 
interfaces,  the  data  administrator  can  guickly  change 
formatted  metadata  cr  create  new  formatted  definitions  from 
existing  D/E  entities.  The  static  D/L  can  he  made  ccnpat- 
ible  with  many  versiens  of  other  software  packages  and  can 
he  developed  independently  cf  the  source  code  of  particular 
software  packages.    &   disadvantage  to   the  user  of  a  static 
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interlace  is  tie  eitra  effort  that  may  be  reguired  to 
generate  and  catalog  netadata  for  the  D/D. 

More  significantly,  tte  static  interface  itself  has 
nc  capabilities  for  updating  the  metadata  of  the  systems 
with  which  it  interfaces.  Without  adequate  synchronization 
and  ccntrcls,  the  metadata  in  the  DDS  and  the  metadata  in 
ether  systens  may  beccme  inconsistent  £Bef.  5  ]- 

Eynamic  interfaces  provide  direct  decess  ty  the  EDS 
to  other  software  modules.  This  direct  access  is  commonly 
achieved  via  high-level  interface  commands  that  shield  the 
software  package  freir  the  physical  details  of  the  D/E.  Ihe 
cemmards  activate  stardard  LDS  functions,  so  as  tc  select 
all  entity  occurrences  that  satisfy  a  particular  ccrditicn. 
A  DCS  car  provide  a  facility  that  majtes  commands  available 
through  call  statements;  any  program  can  then  access  the  E/D 
without  knowledge  of  its  physical  structure.  Dynamic  inter- 
faces provide  consistency  control  and  capabilities  fcr  hcth 
update  aDd  retrieval.  Charges  to  the  D/D  are  automatically 
reflected  in  the  next  execution  of  any  software  packages  to 
which  tte  D/D  is  interfaced;  nc  intervening  procedures  are 
reguired  as  with  static  interfaces.  A  software  package  can 
directly  retrieve  and  update  netadata  stored  in  the  D/D  if 
the  user  has  the  authority  to  do  so,  and  the  software 
package  has  a  such  capability.  Otherwise  the  software 
package  and  the  usei  would  enly  have  read  authority  tc  the 
E/D. 

Here  is  where  special  attention  must  be  given  when 
designing  a  DDS  for  the  SPLICE.  We  said  previously,  when  we 
described  the  first  and  the  secend  Hierarchical  structure 
for  SE1ICE,  that  the  local  copies  of  the  SPLICE  EDS  will  he 
exact  images  of  the  caster  copy.  With  this  approach  one  can 
imagine  what  will  happen  if  cne  program  in  any  of  the  62 
LAN's  attempts  to  update  the  metadata  stored  in  the  DES. 
Ihe  whole   consistency  of   the  system   is  gone.     The  local 
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copies  will-no  lcngei  re  exact  images  of  the  master  copy  and 
many  problems  car>  arise.  The  only  solution  fcr  the  proposed 
architecture  for  SP1ICE  DD S  is  that  requests  for  update, 
deletion,  or  additior  of  data  definitions  must  be  routed  via 
the  EEN  tc  the  node  where  the  master  copy  of  the  CDS 
resides.  Tien  the  data  dictionary  administrator ,  whc  is  the 
cnly  person  responsible  for  LIS  maintenance,  can  aprcve  and 
make  tie  reguested  changes  in  the  master  copy.  Ihese 
changes  must  then  he  transmitted  to  the  various  lccaticns 
where  copies  of  D/D  reside  and  executed-  This  we  believe  is 
the  crly  procedure  under  the  proposed  DDS  architecture  which 
can  maintain  consistency  over  the  whole  SPLICE  system.  We 
cannct  say  that  this  kind  of  operation  is  purely  dynamic, 
but  neither  is  it  static.  We  might  call  it  is  a  hybrid 
interface  function  wlerein  the  security  and  validity  checks 
cf  the  EES  are  always  applied. 

Ihe  use  of  dynamic  interfaces  incurs  significant 
overhead  due  to  the  size  and  complex  structure  cf  EES. 
Application  development  support  aids,  such  as  preprocessors, 
source  program  managers,  and  design  aids  generally  can 
afford  this  overhead  tecause  response  time  is  not  critical. 
Cn  the  ether  hand,  efficiency  is  critical  for  transacticn- 
processing  systems  that  reference  the  D/D. 

To  reduce  the  potential  overhead,  common  gueries  may 
be  preccnpiled   and  stored  in   the  D/D.  Ancther  technicue 

used  tc  reduce  overhead  is  fcr  the  software  package  to 
retrieve  all  the  metadata  reguired  for  a  transaction  at 
ence;  thus  future  accesses  for  this  transaction  only  involve 
memory  leckup.  Table  VIII  from  £Ref.  5]  shows  some  typical 
types  cf  software  packages  interfaces  for  EDS. 

2  •   Ccnyert  Functions 

In  addition  tc  software  interfaces  the  integration 
cf   a   EIS   into   its  environment   is   provided   ty   convert 
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iuacticns.  A  CDS  or  ganizat  ion  has  a  lot  of  programs,  report 
and  files  tc  manage.  The  data/data  dictionary  administrator 
jiust  encode  thousands  cf  maintenance  transactions  tc  capture 
the  metadata  or  all  these  applications.  The  convert  func- 
tions cf  a  IDS  scan  scurce  programs,  database  descriptions, 
and  t elepr ccessing  environment  descriptions  and  automati- 
cally produce  maintenance  transactions,  thus  sparing  the 
data  administrator  mary  hours  cf  manual  .effort.  Figure  2.3 
from  £Bef.  5]  illustrates  the  flow  of  data  through  a  typical 
convert  function. 
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Source  language 
statffttnts 

Convert  function 

Data  dictionary 
transactions 

i 
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1 

Data  dictionary 
■alntenance 

Figure  3.3    System  Flow  for  a  Convert  Function. 

Inpcts  include   the  source   language  statements   and 
the  L/L;      outputs  are  a  file   cf  transactions  to  te  input  to 


50 


the  L/L  maintenance  nodule,  (in  the  case  cf  SPLICE  that 
refers  tc  the  maintenance  module  cf  the  master  z  opy)  and  a 
xeport. 

Ihe  D/D  maintenance  transactions  include  descrip- 
tions cf  databases,  idles,  records,  groups,  elements  and 
programs.  The  prine  purpose  of  convert  functions  is  to 
convert  metadata  frcn  both  user-written  programs  and  from 
local  LEV.S  and  its  related  components.  Table  IX  illustrates 
in  sunmary  the  topical  D/D  convert  function  transac tiers. 

Ecur  major  characteristics  £Eef.  5]  for  convert 
functions  are: 

The  content  of  the  generated  transactions  where  the  I/D 
nainterance  transactions  created  by  a  convert  function 
usually  also  contains  the  relationships  between  data 
entities. 

lk£  ijD^u t  file  to  a  convert  function  that  can  be  a  scurce 
progran  cr  a  library  iile. 

lk§.  ccjrmand  options  *hich  may  include  the  ability  tc  change 
names,  elect  lines  tc  scan,  select  types  of  transactions  to 
create,  and  override  generation  of  some  types  of  metadata, 
where  the  ability  to  analyze  the  metadata  of  source  programs 
can  make  the  DD£  a  valuable  tool  for  auditing  adherence  to 
software  ccntrol  teciiigues. 

2 .   Environmental  Dependency 

Ihis  characteristic  cf  a  DDS  is  determined  tv  its 
reliance  en  a  specific  hardware  configuration,  an  operating 
systeir,  a  DBMS,  or  a  teleprocessing  menitor.  Under  ideal 
conditicrs  a  DDS  must  have  the  capability  to  cperate  in  such 
an  environment  without  losing  efficiency  and  functionality. 
Eut  scnetimes  the  practice  deviates  from  theory. 
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In  a  completely  integrated  DDS  the  DBMS  accesses 
stored  databases  via  the  D/D.  In  a  less  integrated  system, 
the  EEMS  may  maintain  its  own  directory  file  for  accessing 
stored  datarases. 

In  the  independent  abroach  the  DDS  is  completely 
autonomous,  it  dees  net  rely  en  any  particular  DBMS,  arc  the 
IBMS  maintains  its  en  source  cf  metadata. 

In  the  DEMS  amplication  approach  the  D/D  appears  to 
the  DEMS  as  just  another  database.  The  DBMS  mairtains  its 
cwn  metadata  for  each  database  and  these  metadata  are  eepax 
rate  iron  the  D/D. 

lor  the  SPLICE  system,  it  is  proposed  that  the 
€mhe_dded  approach  he  used,  where  the  DDS  is  actually  a 
component  cf  the  DBM£'s.  Ihis  approach  provides  complete 
integration  of   the  IIS.     The  D/D   is  the   only  source   cf 
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Figure  3.4    SfllCE  Embedded  Approach  to  DDS 


metadata.  The  DBMS  utilities  provide  the  D/D  management 
facilities  and  the  DECS  uses  the  D/D  to  directly  access  the 
stored  databases.  Nc  ether  directories  internal  or  external 
exist  for  the  DEMS,  and  the  DBMS  and  its  facilities  rely 
completely  en  the  D/E  for  metadata-  Such  a  structure  is 
shown  in  figure  3-4. 


Sc  for  example  a  guery  processor  extracts  user  ?iews 
from  the  DIS  aod  the  DBMS  applies  integrity  constraints 
specified  in  the  DDS  ty  the  DCS  administrator  before  storing 
a  data  element.  A  najor  difficulty  here,  that  the  SELICE 
designers  must  overcome,  is  the  fact  that  the  DEMS  for 
SPLICE  already  exists  nut  the  DDS  does  not-  The  eicredded 
approach  is  easier  ard  simpler  when  both  DDS  and  DEMS  are 
developed  in  parallel,  but  this  is  not  the  case  for  the 
SPLICE.  Sc  special  attention  and  effort  must  te  applied 
during  tie  IDS  development  phase. 
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TAEIE    VIII 
lypes    cf    Software    Packages   I   D/D    System 

Module  ££§cri£tioc 

III    Iiocessor  Creates    a   schema    file 

Eataiase  control   system  Bun-time    unit  of    a   LEtS 

Preprocessor  Translates   EML    intc    CAII 

statements 

Ccerj/upda te  Processor  Provides    direct    end-user 

access    to   stored 
databases 

Eatch-ccde    generator  fieduces    the    time    tc 

develop  a  standard 
function  as  compared 
to  a  compiler-level 
language 

Scurce-^rogram  marager        Provides  security 

protection,  data 
compression  and  editing 
capabilities  for  source 
programs 

leletrccessicg  monitor        Provides  the  capability 

of  interactive  commuting 
to  remote  terminals  . 

lest-data  generator  Creates  test  files 

and  databases  acccrdirg 


Eesign  aid  Analyzes  and  generates 

designs  of  databases 


to  user  specifications 

Analyzes  and  generates 
designs  of  databases 
or  information  systems 
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TABU  IX 
Iransacticis  for  D/C  Convert  function 

Mcdule  type  Generated  transactions 

Erogranaing  Element,  group,  record,  rile, 

and  sometimes  Subschema 
and  process 

Eatarase  description     Database,  file,  subschema, 

relationship,  record, 
group,  element 

Teleprocessing  Terminal,  line,  processor, 

transaction 
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11 .  SESSIOS  SEBVICES  AND  DATA  DICTIGNABY 

A.   GIKIiAI 

lie  term  "session"  is  defined  in  £Ref.  4]  as  fellows: 

"Session:  All  the  activity  (message  exchange  and  processing) 
which   takes  place   tetween  twc   or  more   processes  for   the 
duration  of  a  single  task   (e.g.   text  editing  or  prccessing 
of  a  transaction  file)." 

lie  session  services  module  of  the  SPLICE  has  to  play 
the  rcle  cf  coordinating  the  activity  of  the  other  func- 
tional modules  and  providing  them  with  work  instructions  via 
the  service  codes  it  inserts  in  messages  to  the  Ffl's.  Ihe 
sequence  cf  oceraticrs  may  ie  data  dependent  or  highly 
interactive,  so  in  seme  cases,  work  breakdown  cannct  he 
completely  determined  in  advance  by  the  session  services. 
In  such  cases  sessicr  services  passes  control  to  the  first 
(controlling)  Ffl  which  is  to  perform  an  operation,  and 
subsequent  "calls"  tc  other  FM's,  if  any,  take  place 
according  tc  processing  conditions.  In  all  cases  however, 
sessicn  services  passes  contrcl  to  the  first  (controlling) 
FM.  However  in  some  cases,  all  the  FM's  which  will  be 
involved  cannot  ie  determined  in  advance.  Session  services 
retains  and  maintains  state  information  until  either  a 
completicn  nessage  or  error  message  has  been  received  from 
the  controlling  FM.  In  the  case  of  a  message  which  is 
destined  fcr  an  object  located  in  another  network,  this  fact 
is  indicated  in  the  "message  type"  field.  The  physical 
destination  address  wculd  have  been  obtained  previously  from 
the  data  dictionary  which  exemplifies  the  relaticrship 
tetween  session  services  and  data  dictionary. 
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Figure    4.1        Cooperation  Between    SS   and   Functional    Modules. 
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Session  services  is  used  in  a  distributed  envircrment 
and  involves  the  seven  layer  architecture  model  of  the  ISO 
for  distributed  netwciks.  Ihe  ISO  seven  layer  architecture 
is  a  standard  one  ard  involves  the  following  layers  fcith  the 
associated  functions: 

iJlisi  Fciction 

Application     Oser  process 

Presentation   Fcrmat  data  the  user  wants  it 

Session        Sets  up  session  between 
ccnmunicating  processes 

Transport      Erd  to  erd  control 

Network        Sfcitching,  routing 

Eata  link      Reliable  transmission  between 

t«c  nodes 

Physical       Physical  transmission  of  bits 

between  two  nodes 

The  ccnplexity  of  the  SPLICE  processing  envircnment 
reguires  that  user  terminal  processes  be  given  z  crsiderarle 
assistaice  in  carrying  out  their  tasks  [Ref.  4].  Session 
services  can  provide  this  assistance.  User  tenrinal 
processes  specify  task  envircrments,  largely  by  task  name 
and  the  assistance  of  the  data  dictionary,  where  necessary 
(figure  1.1). 

E.   AECH31ECTUBE  INTERFACES 

In  the  SPLICE  layered  architecture,  the  interfaces 
retweer  the  layers  aie  critically  important.  In  particu- 
larly we  are  very  interested  in  the  software  interfaces 
tetween  the  modules  which  communicate  with  the  data 
dictionary.  These  ncdules  are  the  session  services  nodule 
and  the  EEMS  module.  Some  forms  of  software  interfaces 
tetween  IBMS  and  D/D  can  be  fcund  in  the  current  literature 
£flef.  5]-    On   the  ether   hard  no  one   has  yet   defined  the 
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required  scftware  interfaces  between  the  D/D  and  session 
services  modules.  We  .believe  that  the  above  nenticned  soft- 
ware interfaces  must  te  of  the  same  type  and  closely  related 
to  tie  interfaces  between  the  end  user  and  the  session 
services.  In  a  centralized  system  where  session  services 
does  net  exist,  the  end  user  has  to  interface  directly  with 
the  E/D,  tut  in  a  distributed  system  the  session  services 
nodule  acts  as  the  nediatcr  between  the  end  user  arc  the 
data  dictionary.  As  a  minimum  then,  the  interfaces  between 
sessicn  services  and  the  data  dictionary  in  a  distributed 
system  mtst  include  the  interfaces  between  end  user  and  data 
dicticnaiy  in  the  centralized  model. 

The  interfaces  between  the  abeve  modules  must  be  designed 
to  accommodate  new  mechanisms  and,  as  far  as  possible,  new 
functions  when  they  may  arise.  As  new  mechanisms  and 
network  functions  come  into  use  in  the  system,  it  is  highly 
desirable  that  previously  written  programs  continue  to  work. 
This  is  achieved  by  designing  the  interfaces  appropriately 
and  pieserving  them.  In  the  seven  layer  architecture, 
layers  4,5,6  and  7  provide  end-to-  end  communication  between 
sessions. in  user  machines.  layers  1,2  and  3  provide  cemmu- 
nicaticn  with  the  nodes  of  the  shared  network. 

Eecause  the  SPLICE  system  uses  a  modified  ISO  layered 
apprcach,  the  interfaces  between  machines  need  to  te  defined 
in  terms  of  the  layers.  So  we  will  have  layer  headers  and 
contrcl  messages  that  are  passed  between  the  layers.  3he 
application  programmer  does  not  need  to  know  anything  abcut 
these.  For  example  any  command  language,  using  ccmnands 
simmilar  tc  GET,  PUT,  OPEN,  CICSE  and  DELETE,  can  refer  to 
data  cr  facilities  in  a  distant  machine. 
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C.   TEE  £ES£IOH  2EBV1CES  MOIOIE 

Tiere  are  differences  in  the  session  services  provided 
depending  upon  type  cf  network.  In  the  distributed  environ- 
ment different  types  of  user  software  need  different  types 
cf  session  services.  These  differences  involve  net  enly  the 
software  tut  also  the  architecture.  So  one  set  of  session 
services  nay  be  provided  for  ore  manufacturer's  architecture 
and  a  different  set  for  another.  This  is  very  important  for 
the  EIIICE  because  tie  hardware  used  throughout  the  system 
varies.  It  may  be  possible  that  services  provided  across 
the  system  are  of  different  types.  However  it  is  desirarle 
to  have  common  sessicr  services,  because  this  will  facili- 
tate tie  mainterance  task.  Also  for  interfacing  purposes 
want  sessicn  services  to  present  a  common  image  tc  the 
system.  This  can  te  accomplished  by  bidding  necessary 
interface  units  from  the  sessicn  services.  In  [Bef-  10  pp 
491  ]  there  is  a  description  cf  possible  functions  cf  the 
sessicn  services  subsystem  in  a  distributed  network.  These 
functions  are  generally  divided  into  three  large  groups: 

-functions  required  when  setting  up  or  disconnecting  a 

sessicn. 
-lunctiens  used  during  the  normal  running  of  a  session. 
-Jurcticns  employed  when  something  goes  wrong,  such  as  a 

rede  failure  or  a  protocol  violation. 
Mere   precisely   these   fucctions   are   divided   in   the 
fcllcwing  categories: 

— Assistance  in  establishing  a  session 
--Easic  netwerkire  functions 
--Application  macrcinstr uc tions 
--JPicgram  control  facilities 
--Pile  access  functions 
--Eecovery  and  errcr  contrcl 
--Editing  and  trarslaticn 
--Dialogue  software 

60 


--Virtual  operaticDs  and  trarsparency 
--Ccn paction 
--Eayaeut  functicrs 
--Security  and  audit  functions 

I-   1KIEEFACES 

PurcticEal  interfaces  between  sessicn  services  and  data 
dicticnary  iiust  permit  ether  software  nodules  to  access  the 
E/D  and  convert  metadata  into  the  format  required  tj  the 
EDS. 

A  EE£  provides  mary  functions  and  features  such  as: 

Mainter ance 

Eatensitilitj 

Eepcrt  processor 

Cuery  processor 

Ccnvert 

Software  interface 

Isit  facility 

lie  software  interface  function  must  provide  a  fcrnatted 
pathway  trailing  the  LBS  to  provide  metadata  to  other  soft- 
ware systems  such  as  compilers  and  DDL  processors  [Eef.  5], 
to  retrieve  informaticn  from  the  DDS,  to  update  infernatien 
where  it  is  permited,  and  to  crtain  the  restristicr  proto- 
cols for  data  consistency  ard  integrity.  The  software 
interface  can  generate  file  descriptions  for  storage  in  a 
program  litrary,  or  accept  the  user  identification  and 
generate  a  copy  of  that  user's  database  view.  It  is  not 
possihle  fcr  this  study  to  describe  precisely  the  software 
interfaces  needed  fcr  the  SI-LICE  system-  Because  this 
system  is  under  development,  irany  aspects  or  the  systea  are 
still  urknewn  and  the  software  modules  are  not  yet  descrihed 
in  full  detail.  So,  we  will  enly  outline  some  of  the  soft- 
ware interfaces   withcut  claiming  that  these   are  sufficiert 
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for  the  SfllCE  system.  Interfaces  can  te  added  tc  the 
systea  during  the  later  stages  of  the  system  life  cycle  and 
existing  interfaces  can  also  te  changed  or  improved  as 
neede  d. 

because  COBOI  is  used  throughout  the  system,  the  COEOI 
"GENIEA1I"  command  car  create  from  the  D/D  fully  formatted 
file  and  record  definitions  that  can  be  stored  in  a  lihraiy 
file.  Included  can  te  most  CCE01  clauses  such  as  88  levels, 
EYNCEECN1ZII,  BEEEEIMS,  and  CCCUfiS.  The  OPTION  clause  of 
this  ccmiand  can  pernit  changes  in  names,  the  designaticn  cf 
seguence  nunbers,  level  numters  and  identifiers,  and  the 
inclusicn  cf  prograu  comments.  An  example  of  the  use  of 
this  ccmnand  can  be  fcund  in  £Ref.  5  pp  261].  Ihe  gerera- 
tion  date,  last  revision  date,  and  revision  number  can  be 
autcaatically  recorded  in  bcth  the  listing  and  the  L/L. 

Ihe  output  file  can  alsc  contain  jcb  control  stateaents 
to  be  included  on  the  output  file.  Then  the  output  file  can 
te  executed  as  a  jcb  that  creates  and  catalogs  the  COEOL 
netadata  as  a  member  of  a  litrary  under  control  cf  ar.y  cf 
the  varicus  source  picgram  canagers. 

A  EM  processor  can  be  used  also  to  interface  hetween 
the  session  services  and  data  dictionary.  A  source  prcgiam 
triggers  the  DML  processor  hy  sending  a  service  code, 
through  the  session  services,  and  the  DML  processor  inter- 
acts vith  the  data  dictionary/ directory.  The  output  cf  the 
EML  processor  is  an  expanded  source  program  that  is  sent  to 
a  compiler  for  compilation. 

Cther  kinds  of  interfaces  include  guery  processors, 
source  picgram  managers,  varicus  user  interface  facilities, 
and  cther  software  packages. 
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Figure  4.2    Software  Interface  Using  a  DM1  Processor, 
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v-  £ZD  IK  DISTBIEDTED  ENVIRONMENT 

A.   IBTECIDCIIOH 

Id  this  chapter  *e  will  consider  the  design  and  function 
ex  DLS  ir  tie  distributed  datalase  environment.  Soire  exten- 
sions tc  the  centralized  D/D  are  needed  in  crda r  tc  eratle 
it  tc  iurction  effectively  in  a  distributed  envir onment . 

lie  distributed  system  is  a  subset  of  a  general  informa- 
tion system.  It  is  rot  necessary  for  the  user  tc  knew  how 
cr  where  the  data  is  stored  or  in  what  way  the  data  will  be 
accessed  ty  a  progran  or  hew  and  where  the  processing  is 
accomplished.  Unless  the  dictionary  plays  a  highly  active 
cole  in  the  running  of  the  distributed  system,  there  is 
little  need  to  try  tc  share  cce  dictionary  over  the  entire 
network.  lhis  is  because  there  is  not  likely  to  be  a  large 
amount  of  update  activity  in  a  dictionary.  The  dicticnary 
can  normally  be  reprcduced  at  each  node  and  this  is  the 
proposed  solutions  fcr  SPLICE.  Ey  using  such  an  architec- 
ture, problems  of  updating  the  dictionary  across  the  network 
can  be  solved  without  much  overhead. 

Cf  course  the  problem  cf  distributed  control  in  a 
network  is  more  complex  than  that  of  the  hierarchical  archi- 
tecture cf  dictionary  systems  which  has  been  discussed  in 
chapter  two.  This  is  one  reason,  in  addition  to  the  lack  cf 
experience  with  distributed  data  dictionary  systems,  why  we 
proposed  replication  instead  of  distribution  of  the  data 
dicticnary  for  SPLICE.  The  mere  the  dictionary  system  acts 
as  either  the  ccntrcl  mechanism  or  a  repository  cf  control 
information,  the  more  complex  the  DBMS,  network  operating 
systems,  and  dictionary  system  interactions  become.  for 
example,   in   the  case  where  we   want  to  determine   the  test 
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location  for  running  a  query  against  a  distributed  and 
partial!}'  replicated  database  ["Ref.  6  J  the  dictionary  system 
is  reguired  to  retain  information  on  the  location  of  all 
data.  Indeed,  this  nay  be  highly  dynamic  itself,  and  there- 
fore the  line  tetween  a  dictionary  and  "real"  database 
becomes  very  fuzzy. 

Creation  of  a  distributed  information  resource  inclies 
that  the  number  of  hardware  and  software  components  are  to 
he  designed  and  integrated  into  a  controlled  environment. 
Ihese  components  in  the  SPLICE  include  several  databases  and 
database  management  systems,  user  language  interfaces,  data 
dictionary/directory  catalogue,  transaction  controllers  and 
data  input/output  control  modules.  We  will  descrit€  the 
varices  system  components  and  we  will  also  attempt  to  demon- 
strate the  integration  of  them  with  the  international  orga- 
nization for  standards  (ISC)  communications  architecture, 
and  a  data  storage  ard  retrieval  architecture  (DSRA)  . 

In  general,  a  distributed  system  must  provide  to  the  end 
user  transparency,  data  sharing,  data  transfer,  process 
transfer,  or  a  facility  for  combination  of  strategic,  nara- 
gerial  ard  operational  reportirg.  In  order  to  do  that  there 
are  several  environmental  constraints  that  must  be  satisfied 
[Bef-  12^-   Ihese  are: 

Eata  ccmmunicaticrs 

lata  storage  and  retrieval 

Metadata 

User  language  support 

Erccess  and  report  management 

Information  representation 

System  management 

Integrity 

Security 

Per  the  SPLICE  system,  communication  must  be  integrated 
with   cooperative    processing   of   the    various   different 
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existing  software  and  hardware.  In  order  to  do  that  we  need 
to  address  the  considerations  of  the  database  interface  with 
distributed  system  tasks. 

A  distributed  database  is  particularly  useful  tc  ampli- 
cations that  involve  extensive  processing  in  different  loca- 
tions. SiU.CE  fits  exactly  in  the  above  concept  as  do 
airlines,  banking,  retail,  and  military  command  and  control 
applications.  The  distributed  database  of  the  SPLICI  can  be 
allocated  among  the  nodes  of  the  network  according  to 
various  existing  criteria  for  fragmentation.  Tc  avcid 
confusion  in  distributed  systems  two  different  terms  are 
used  :  partitioned  database  which  consists  of  non  overlap- 
ping subsets,  and  replicated  database,  which  has  seme  data 
redundancy  [Eef.  5].  Eeplicaticn  enforces  the  locality  and 
availability  of  the  database  and  reduces  the  freguercy  of 
accessing  the  DEN,  lut  recunes  the  DBMS  tc  prcvide  mere 
sophisticated  concurrency  and  recovery  procedures.  Tc  avcid 
expensive  overhead  in  data  management,  restrictions  must  be 
established  as  to  the  degree  of  data  replication  permitted. 
SPLICI  telcngs  in  the  class  cf  replicated  database  because 
the  same  item  of  the  database  can  be  located  in  several 
loeatiers  and  the  lecal  databases  provides  information  for 
items  stcred  in  enly  cne  location. 

Ma^cr  problems  in  the  development  of  technigues  for  a 
distributed  datarase  are  due  to  communication  volumes  and 
delays  and  to  the  potential  for  parallel  processing. 
Sometimes  it  is  very  difficult  tc  apply  working  soluticrs  to 
distributed  data  processing  which  are  borrowed  from  the 
centralized  processing  concept.  These  solutions  often  work 
well  crly  in  one  ervironment  and  do  not  transfer  effi- 
ciently. So  excessive  delays  may  occur.  Parallel 
processirg  also  has  the  potential  to  increase  throughput, 
but  reguires  complex  controls  to  synchronize  ccrcuirent 
activities  at  dispersed  sites.   Because  a  data  dicticnaiy  is 
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a  database  containing  metadata,  the  same  problems  existing 
in  distributed  databases  also  exist  in  a  distributed  data 
diet icnary . In  ££ef.  5]  are  described  five  basic  problems 
which  must  he  addressed  in  distributed  data  management : 

-Ihe  coordination  cf  the  DEMS  with  the  data  transmission 
network  such  that  reliable  delivery  of  messages  can  be 
ensured. 

-Ihe  decomposition  of  transactions  into  atomic  parts, 
selection  cf  nodes  tc  execute  those  parts,  and  ccntrcl  of 
any  movement  of  data  between  sites  necessary  tc  process 
transactions. 

-Ihe  synchronization  of  logically  related  updates  and 
retrievals  that  are  riocessed  at  different  nodes. 

-Ihe  detection  and  resolution  of  conditions  where  a  part 
cf  the  database  becomes  inaccessible  due  tc  node  or  line 
failure. 

-Ihe  management  cf  metadata  describing  the  distributed 
database  and  environment.  Ihis  last  problem  refers  particu- 
larly tc  the  data  dictionary  and  deserves  special  attention. 

E.   E1IESEICNS  TC  THE  BBS 

lie  icle  of  a  D/L  in  a  distributed  database  environment 
is  very  significant  because  it  contains  important  informa- 
tion about  the  description  of  the  database  distribution,  the 
characteristics  cf  the  nodes  and  other  aspects  of  the  data 
communication  network.  Seme  additional  entities  must  be 
included  in  the  EDS  £Ief.  5]  : 

-Ihe  database  entity  which  describes  the  global  view  cf 
the  database  and  includes  attributes  for  relation  and  attri- 
bute names,  validity  constraints,  as  well  as  identification 
cf  lecal  databases. 

-Ihe  fragment  entity  which  describes  portions  cf  the 
local  database.  This  entity  is  not  useful  for  the  SELICE 
because  there  are  not  fragments  of  the  local  database. 
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-Ihe  topology  entity  which  describes  the  physical 
conf iguraticn  of  network  ccmpcnents  and  the  links  tetween 
the  cedes. 

-Ihe  ncde  entity  which  describes  the  combination  of 
network  subcomponents  at  a  particular  site  of  the  network. 

-finally  some  otter  entities  (terminal,  line,  multi- 
plexer, processor)  describing  network  design. 

fce  cannct  sa}  exactly  what  new  entities  should  be  added 
to  the  SI-LICE  DDS,  but  at  least  initially,  we  believe  that  a 
form  cf  tcpclogy  and  ncde  entities  must  be  included.  Ihcse 
entities  are  needed  when  ncn-lccal  reguests  are  processed, 
recause  the  software  performing  transaction  management  needs 
to  reference  the  D/D  to  determine  the  location  of  the  needed 
data,  the  user's  access  privileges,  the  status  in  addressed 
nodes,  etc.  The  interfaces  needed  for  this  purpose  can  be 
dynamic  cr  static  exactly  as  it  is  in  the  centralized  case. 

C.  Ill    LBS    AS  A  DIS1EIB0TED  DATABASE 

Practically,  the  D/D,  when  supporting  a  distrituted 
system,  teccmes  itself  a  distributed  database.  The  contents 
cf  the  D/D  may  reside  at  various  locations.  We  cannct  say 
that  this  approach  fits  exactly  in  the  SPLICE  case.  Ihe 
approach  we  have  proposed  for  the  SPLICE  is  guite  different. 
No  partition  of  the  I/D  is  permitted.  That  means  the  L/D 
cannct  te  a  distributed  database  as  we  know  it  in  the  crig- 
inal  fern.  for  the  solution  proposed  for  SPLICE  DDS,  we  can 
say  that  it  is  based  on  replication  instead  of  distritution 
cf  the  DCS.  On  the  ether  hand,  there  are  seme  other  reascn- 
able  sclutions  which  follcw  nore  closely  the  distrituted 
concept.  Since  experience  with  distributed  systems  is  rela- 
tively snail,  the  steps  needed  to  reach  a  decision  must  be 
taken  very  carefully  in  oder  to  avoid  mistakes. 
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The  designer  of  a  DDS  eccounfers  some  similar  tasic 
problens  as  does  the  designer  of  a  distributed  dataiase. 
When  we  design  a  D/E  we  must  determine  the  extent  cf  envi- 
ronmental dependency  between  the  D/D  and  the  DBMS.  As  we 
said  bef ere,  the  distributed  E/D  is  an  extension  cf  the 
centralized  one  and  sc  the  three  basic  variations  tc  the 
type  cf  relationships  between  a  DCS  and  a  DBMS  are  still  in 
force.  In  the  independent  distributed  approach  the  £E£  has 
no  ruirirg  connecticcs  to  any  portions  of  the  DBMS  ard  is 
not  actively  or  directly  used  in  transaction  processirg  by 
the  EEMS.  In  the  DEKS-app lacaticn  approach  the  C/D  is  just 
another  distributed  database  to  the  DBMS  and  separate  data 
management  functions  are  not  needed  to  handle  the  D/E.  Ihe 
EBMS  nay  manage  its  c*n  run  time  directory  that  is  separate 
from  the  D/E.  In  the  embedded  distributed  approach  tie  L/D 
provides  the  run-tine  directory  for  the  OEMS.  All  the 
components  cf  the  DEES  obtain  their  metadata  from  the  D/D. 
Ihe  size,  location,  and  contents  of  the  D/D  would  also 
affect  the  performance  of  other  DDS  functions  such  as  main- 
tenance, reportirg,  and  yuery  £Bef-  5]. 

E.   A  HCEEI  FOE  2  DISTRIBUTED  DDS 

In  this  section  we  are  gcing  to  examine  a  distributed 
model  for  SE1ICE  DDS.  Its  structure  is  shown  in  figure  2.4, 
and  invclves  the  partition  cf  the  global  DDS  into  different 
views  ccrtaining  information  for  one  or  more  local  data- 
bases. These  different  views  can  be  located  at  each  or 
selected  lAK's. 

Ihe  glcral  (cr  network)  dictionary  is  the  nucleus  around 
which  all  the  managenent  functions  of  a  DDS  are  centered. 
Jt  certains  £Ref.  11]  information  to  start  every  maragement 
process  cf  the  SELICI  distributed  database.  In  particular 
it  certains; 

a  .-Inf crma ticn  for  the  IDS    design 

6S 


-file  access  programs 

-Ictal  volumes  of  gueries  for  each  file 
-Ictal  volumes  of  updates  for  each  file 
Ibis   statistical   infor nation  is  very   useful  especially 
for  evaluating  the  optimal  numrer  of  redundant  copies. 

r  ,-Iui oima ticn  fcr  the  distribution  function 

-Number  and  types  cf  transmission  links,  their  urit 

ccst,  their  mean  utilization  factor 
-Ecuting  tables 
-CPU  workloads 
-Eisk  utilization 
Ibis   information  can  help   determine  the  optical  alloca- 
tion  of  redundant   file  copies   and   of  possible   operation 
parallelism . 

c. -General  infornation  about  data  and  how  data  is  shared 
amonc  tie  various  ncdes  of  the  system.  What  the  numrer  of 
I/D  ccpies  is  and  where  they  are  located. 

d  .-Irf crma ticn  atcut  existing  constraints,  status  ci  the 
system,  rede  failures  etc. 

e  .-Irf crmaticn  atcut  data  transportability 

f  .-Irf orma ticn  related  tc  data  used  by  applications 
having  a  global  vie*.  Such  applications  are  for  example 
those  where  different  local  databases  are  involved  for 
executicr.  We  said  in  a  previous  section  that  sometimes 
data  redundancy  is  preferable  over  the  freguent  use  cf  the 
EDN.  lhat  means  infermation  atout  the  sites  where  a  compo- 
nent (i-e  spare  part)  is  located  must  be  somewhere  in  a 
central  position.  So  in  the  case  where  the  component  cannot 
be  fcucd  in  the  local  database,  the  user  has  to  access  the 
global  data  dictionary  to  find  tne  places  where  the  partic- 
ular iten  is  located. 
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Tc  b*e  able  to  design  and  run  amplication  or  retrieval 
programs  the  global  E/D  must  contain  information  [Hef.  11] 
about : 

Lata  structures 

Eata  location 

Eata  availability 

Eata  accessibility  (related  to  security,  compatibility 
etc) 

Eata  translation  naps,  access  paths 

Eata  entities 

Ccimcn  procedures 

Events  and  their  interrelations 

Ihis  dictionary  must  be  able  to  answer  queries  atcut  D3 
and  EEMS's  involved  in  a  transaction  and  how  the  transaction 
can  be  formulated  to  cdtain  the  most  efficient  result. 

Iccal  dictionaries  include  information  abcut  local  data- 
tases  anc  applications,  local  data  entities,  local  proce- 
dures, lecal  interrelations,  physical  storage  structures  of 
local  eata,  access  methods,  access  paths,  physical  stcrage 
devices,  and  redundancy  of  data  items. 

In  [Eef.  11]  a  structure  is  proposed  for  a  distributed 
E/D  guite  different  ficm  the  SP1ICE  approach.  This  struc- 
ture, as   shewn  in  Figure  5. 1#   involves  the  existence  cf: 

Ketwcrk  dictionary 
Glctal  external  dictionary 
Glcral  conceptual  dictionary 
Iccal  external  dictionary 
Iccal  cenceptual  dictionary 
Internal  dictionary 

and  each  cne  of  the  arove  perfcrms  a  different  fancticn. 

Ihis  architecture  which  is  purely  distributed,  is  prob- 
ably tec  ccnplicated  to  be  implemented  for  the  SEIICE.  It 
is  a  theoretical  model  and  if  we  try  tc  implement  it,  we  nay 
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Figure  5.1    A   Purely  Distributed  Approach  foe  a  LLS 


face  seiicus  interface  prchlems,  resulting  in  the  data 
dictionary  recoming  the  main  resource  consumer- 
lie  functions  we  intend  to  include  in  the  SfllCE  IDS 
will  flay  a  majoi  role,  if  we  want  to  avoid  complex  struc- 
ture and  saturation.  These  functions  must  he  the  minimum 
possiile  needed  for  the  proper  operation  of  the  system.  We 
telieve,  in  the  case  where  the  distributed  instead  of  repli- 
cated apcicach  will  re  followed,  the  architecture  shewn  in 
Figure  2.4  is  the  mere  practical. 


Pcllcwing  the  atcve  architecture  a  global  dictionary 
located  ir  some  code  tas  the  rcle  of  maintaining  consistency 
throughout  the  whole  SELICZ  system.  Keguests  for  updates, 
deletions,  and  additions  are  routed  through  the  data 
dictionary  administrator  and  alter  an  evaluation  procedure 
the  glctal  dictionary  is  updated.  Then  the  changes  are 
transnitted  to  various  locations  where  the  local  copies  are 
updated.  Also  updates  are  transmitted  to  the  data 
directory. 

Lata  directories  can  be  located  at  the  inventory  control 
points  (ICE ) .  In  contrast  with  the  data  dictionary,  the 
data  directory  contains  glctal  information  only  about 
subject,  service  code,  object  name,  and  address.  All  the 
ether  information  is  located  in  the  global  and  the  various 
local  dictionaries.  The  data  dictionary  administrator  is 
responsible  for  maintaining  the  data  directory,  as  well. 
lifferent  views  of  the  glctal  dictionary  are  located  in 
various  IAN fs.  Each  view  can  serve  one  or  more  LAN's  and  it 
is  preferable  to  be  located  at  the  LAN  where  it  is  most 
freguently  used  in  order  to  avoid  unnecessary  usage  of  the 
IDN. 

Khen  an  item  is  net  found  in  the  local  database  the  user 
routes  a  value  location  reguest  through  the  session  services 
(service  cede)  to  the  data  directory,  and  the  data  directory 
replies  with  the  location  address.  Using  the  previous 
infernatien  the  user  can  reguest  and  establish  a  session 
with  the  remote  database  where  the  reguested  information 
resides. 
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VI-     CONCICSICNS    ANE    BEaDMHENEvAIIONS 

A.   CCBCID£10NS 

Cur  effectives,  as  described  in  the  first  chapter,  were 
to  investigate  the  area  of  data  dicticnary/dir ectcry 
systems,  in  a  distributed  environment,  to  outline  the 
advantages/ disadvantages  of  these  systems,  to  present  the 
underlying  ideas,  tc  examine  the  benefits  for  the  SE-LICE 
systeir  frcm  using  a  dictionary/directory  system,  and  finally 
to  delineate  the  interface  requirements  between  a  data 
dictionary/ directory  system  and  other  functional  modules. 
In  addition  to  the  above  objectives  we  discussed  also  seme 
ideas  concerning  the  organization  of  the  data  administration 
function,  and  four  hierarchical  architectures  for  ED£,  each 
one  with  a  different  degree  of  distribution- 
lie  first  architecture  is  rased  on  the  replication  of 
the  E/D.  There  are  no  different  views  of  the  D/E,  only 
exact  copies  of  one  *iew  located  in  each  LAN.  Using  this 
architecture  we  have  62  replicated  copies  of  the  E/D  (the 
same  as  the  number  of  IAN' s) ,  each  containing  the  inf crea- 
tion (metadata)  about  all  SEIICE  data  base  definiticrs  and 
functions  residirg  in  each  IAN.  This  architecture  minimizes 
access  tc  the  DON  but  has  the  drawback  of  requiring  a  let  of 
secondary  storage.  3he  size  of  the  D/D,  statistical  and 
ether  infornatior  concerning  the  frequency  of  using  the  EEN, 
and  the  amcunt  of  information  included  in  the  D/D,  all  will 
have  an  impact  on  the  effectiveness  of  this  architecture. 

lhe  second  architecture  which  allocates  replicated 
copies  of  the  D/D  tc  selected  nodes  (the  most  active)  is 
mere  censer  Dative.  In  the  case  of  a  nuge  dictionary,  this 
saves   a   significant   amount   of   secondary   storage,    hut 
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requires  heavier  use  cf  the  D£N.  Here  the  sirze  cf  the  E/D 
and  tie  appropriate  redes  at  which  to  install  the  replicated 
copies  seriously  affect  the  effectiveness  cf  this 
architecture- 
lie  third  architecture  is  rased  on  distribution  cf  tne 
I/D-  Different  vieks  of  the  D/D  reside  in  each  LAN  and 
contair  information  crly  concerning  the  local  data  rase. 
This  architecture  involves  the  use  of  a  data  directory  (we 
propose  two  replicated  copies,  one  located  in  each  ICE). 
The  use  cf  the  data  directory  (which  contains  limited  irfcr- 
naticn)  prcvides  a  J«ind  of  "relaticn  or  connection"  retween 
the  varicus  views.  Also  a  global  dictionary  is  needed  in 
crder  tc  prcvide  consistency  and  global  function  facilities 
throughcut  the  systen.  This  architecture  is  more  dynamic 
than  the  previous  twe  discussed  so  far.  It  has  the  advantage 
cf  saving  secondary  storage  tut,  on  the  other  hand, 
increases  even  mere  tie  use  of  the  DDN. 

A  fcurth  architecture  was  discussed  just  tc  mention 
another  possibility  fcr  a  distributed  architecture,  tut  cur 
estiniaticn  is  that  it  would  be  too  expensive  in  system 
resource  consumption  for  the  SE1ICE. 

Three  environmental  dependency  options  for  the  IDS 
(independent,  completely  integrated,  and  DBMS  dependent) 
were  also  discussed.  The  main  reason  for  choosing  the 
embedded  (DBMS  dependent)  approach  is  because  the  data 
dicticnary  is  gcing  to  be  used  only  fcr  the  SPLICE  system 
(so  the  independent  approach  does  not  make  any  sense) ,  and 
also  the  SE-LICE  data  tase  already  exists.  Also  the  enrelded 
approach  (DEMS  dependent)  was  chosen  because  of  the  hcncce- 
neitv  cf  the  DBMS  environments  across  LAN's.  The  indepen- 
dent and  completely  integrated  approaches  are  too  ccstly  at 
this  time  although  tie  latter  could  be  implemented  eventu- 
ally ficm  an  embedded  environment. 
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E.   EICCBflEKEATICNS 

lien  the  investigations  performed,  we  have  the  fclicwing 
main  recommendations  ior  the  SELICE  system: 

a.-  The  TANDEM  data  dictionary  that  already  exists 
should  he  the  basis  fcr  the  SfllCE  data  dictionary. 

b.-  lie  D/D  should  te  implemented  cnly  fcr  new  applica- 
tions tecause  it  is  a  herculean  task  to  retrofit  the  D/D  to 
the  existing  old  applicatiors. 

c-  Ihe  embedded  (EBMS  dependent)  approach  should  be 
used  fcr  the  D/D. 

d.-  Iwc  candidate  architectures  should  be  examiiei 
further  tased  on  statistical  and  other  informatics  (not 
availatle  fcr  the   present  thesis) : 

-Eeplicated  architecture  {Figure  2.3)    with 

selection  cf  nodes  where  each  copy  will  reside. 
-Distributed  architecture  (Figure  2.4)  with  the 
use  of  twc  replicated  copies  of  the  data 
directory  located  at  each  ICP. 
e.-  &    IML  processor  should   be  used  to  interface  retween 
data  dictionary  and  session  services. 
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TAJEEH   DATA    EICTIONAfiX 

1  -      Cveiview 

Ihis  appendix  is  included  to  mention  some  features 
(hopefully  the  cost  important)  of  TANDEM  data  dictionary/ 
since  tie  1ANDEK  DECS  will  re  used  in  the  SPLICE  system. 
for  a  mere  detailed  description  of  the  TANDEM  D/L,  see 
[fief-  13:. 

P.  data  definition  language  {DDL)  is  a  language  used 
by  the  data  dictionary  administrator  to  describe  record  and 
file  structures  cf  a  database.  After  the  description,  the 
resulting  source  file  is  input  to  the  DLL  compiler,  arc  the 
EDL  ccjipiler  can  create  data  declaration  source  language  for 
catarase  records  in  three  languages,  COBOL,  FOfiTRAN,  and 
1AL.  lie  EDL  compiler  can  also  produce  PUP  (file  utility 
program)  file  creation  commands  for  database  files.  Ihe 
most  significant  feature  of  LDI  is  its  ability  to  create  and 
naintain  a  data  dictionary.  Ihe  TANDEM  data  dictionary  is  a 
set  cf  seven  files  that  documents  the  structure  and  lecatien 
cf  each  file  in  a  database. 

The  DDL  provides  facilities  for  updating  a 
dicticnary  as  the  database  it  describes  grows  and  the  struc- 
ture cf  the  database  files  changes.  The  DDL  compiler  and 
the  dicticnary  it  creates  serve  as  a  central  pcint  of 
contrcl  ever  a  database. 

1ANEEM  defines  a  database  as  a  collection  cf  files 
structured  to  serve  ere  or  mere  applications.  When  a  list 
of  DEI  statements  --a  DDL  source  schema--  is  given  tc  the 
EDL  compiler,  the  compiler  can  produce  any  of  the  following 
files  : 

*  A  data  dictionary. 
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*  A   FUEU^ile  creation   ccmmand   source. 

*  A    cata   declaraticn   source    for   COBOL, 
FCFIEAN,    or   TAI. 

*  A    schema    report   summari2ing   each   record's 
structure    and    each   file's   access    keys. 

The  data  dictionary  produced  by  the  DDL  compiler  is 
a  set  of  fides  that  forms  a  permanent  record  of  the  database 
schema.  Thus      the      database   schema, stored      as      a      set      of 

dictionary  files,  becomes  a  system  resource.  The  dictionary 
gives  database  managers  information  about  each  file  in  the 
database  and  alsc  shews  how  the  files  relate  to  each  ether. 
After  tie  dictionary  has  been  created,  the  DDL  ccnpiler  can 
read  the  dictionary  ard  produce  COBOL,  F0R1RAN,  or  1AI  data 
declaraticn  source  fcr  any  record  defined  by  the  schema. 
Ihe  dictionary  is  also  used  ty  ENFORM,  TANDEM' s  database 
guerv   language    and    retort   writer. 

2  •      Cre  ating   a    D ictigna ry 

lie  data  dictionary  files  can  be  created  en  any 
subvclume  in  the  system.  The  subvolume  that  is  tc  certain 
the  data  dictionary  is  specified  with  the  DDL  DICT  command 
(for  example  ?DICT  2£TCCKNC.£NTY  )  .  The  DDL  compiler  first 
creates  the  dictionary  files  en  the  quantity  survclume  of 
the    I    SICCKNO   volume,   and   then   opens    the    files  for   access. 

3  •      Dictionary    Retorts 

1ANIEM  trovides  DDL  users  with  ENFCRM  source  ior 
twelve  dictionary  reports.  The  twelve  reports  document  all 
cf  tie  DEFINITION  and  RECCED  entries  in  the  dicticrary, 
descriting  cot  only  their  structures,  but  how  they  relate  to 
each  ether  as  well. 

Cnce  a  schema  describing  a  database  has  been 
compiled  by  the  DDI  compiler  and  a  dictionary  has  teen 
produced,    information  about   the   database   can  easily   be 
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cttaircd  with  a  set  cf  TANDEM  provided  ENECEM  queries.    The 
reports  produced  by  these  queries  provide: 

*  Database  documertatior . 

*  Database  analysis  information. 

*  Quick  access  tc  dictionary  contents. 

The  dictionary  reports  are  produced  from  ENECRM 
source  ttat  is  available  tc  tie  user.  This  means  that  in 
addition  tc  the  standard  reports,  you  can  obtain  customized 
reports,  tailored  tc  answer  specific  questions,  by  simply 
editing  the  TANDEM  supplied  ENFORM  source.  The  ENFCRM 
dictionary  report  source  file  consists  of  12  queries  ttat 
produce  12  different  reports.  Each  query  is  a  separate 
section.  Thus  the  queries  can  be  run  as  a  complete  group, 
individually,  or  ix  any  combination.  The  12  dictionary 
reports  are  shown  in  Table  X. 

4 .   ledatinq  the  lictionary 

As   the  datarase  changes,    its   dictionary  car  be 
updated  to  reflect  the  changes  by  adding,   deleting  cr  codi- 
fying  DEFINITION  and   RECOED  entries.     In  Table   XI  is   a 
.summary  of  TANDEM  dictionary  modification  function. 
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TABU  I 
Dictionary  Report  Summary 

Cuerj 

UsEl  RgEort  descn  pticn 

B1         DICTIONARY  OBJECTS-  E1  describes  each  EEf  a£d 
EECORD  in  the  dictionary,  giving  the  tine  aid 
date  oi  creation,  the  time  and  date  of  the 
last  mcdifica tion,  and  the  version  numher  fcr 
each  oi^ect. 

£2         DEFINI21CN  STEUC1URE-  fi2  lists  all  .of  th€ 

componert  groups  and  fields  for  each  DEE  in 
the  dictionary. 

£3         RECORD  STRUCT  ORE-  R3  lists  all  of  the 

component  groups  and  fields  for  each  EECCFD 
in  the  dictionary. 

£4         DEEINI1ICNS  USING  DEFINITIONS-  £4  shows 
which  EIFs  are  referenced  by  other  DEEs. 
The  referencing  CEFs  are  listed  with  each 
of  its  elements  that  references  another 
DEE  and  the  referenced  DEF's  nane. 

E5         EECORES  USING  DEFINITIONS-  E5  shows  which 

DEEs  are  referenced  by  RECORDS.  Each  RECORD 
is  listed  with  each  or  its  elements  that 
references  a  DEE  and  the  referenced  DEE's 
name. 

£6         DEFINI11CNS  WHERE  USED-  &6    lists  each  IEF 

that  is  referenced  by  another  object,  te  it 
a  DEF  cr  a  RECCRC.  Tne  referencing  DEF  cr 
RECORD  is  shown  in  each  case. 

E7        EECOED  ACCESS-  E7  lists  the  file  nane  and 

access  .keys  {rcth  primary  and  alternate)  for 
each  RICORD.  in  the  dictionary. 

R6         RECORD  EEFINIIICli  METHOD-  R8  shows  the  method 
used  tc  define  each  RECORD.  The  source  CEP 
is  listed  for  these  RECORDS  defined  with  the 
DEF  IS  <def  name>  clause. 

ES         REPORT  EEADINGS-  R9  lists  all  of  the  ENECEM 
report  headings  declared  for  fieds  and 
groups  within  each  EEF  and  RECORD  in  the 
dictionary. 


E10        DISP1A!  EOEMATS-  R10  lists  all  of  the  ENFCEM 
display  formats  declared  for  fields  and 

1  roups  within  each  EEF  and  RECORE  in  the 
ict lonary. 
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E11       RECORE  CCMMENIS-  R11  lists  the  comments  that 
immediately  preceded  the  defining  RECCRC 
statement  for  each  RECORD  in  the  dicticnary. 

£12        DEEINI11CN  COMMENTS-  R12  lists  the  comments 
that  iirnediat  ely  preceded  the  defining  LEI 
statement  for  each  DEF  in  the  dictionary. 
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TABLE    XI 
Dictionary   Modification   Function 


Cperat./Ent. type 
HL/ZEt 

AII/fECCRD 

ZIIEZE/LEF 


IIIElE/fECORD 


MCD1FY/EEF 


MCDlEY/iECORE 

(kit!     EC 

Elf    changes) 

£CDJf  Y./IECORL 
(kith    DEF 
chances) 


Procedure 

Open  dictionary 
compile  new  DEF 


with  ?DICT  and 
statement. 


Open  dictionary  with  ?DICT  and 
compile  new  RECORD  statemen  t 

Open  dictionary  with  ?DICT.  delete 
all  dictionary  entries  that 
reference  the  DEF,    and  then  delete 
the  DEF  itself  with  DELETE 


Open  dictionary 
and  then  delete 
with  the  DELETE 


with  ?DICT 

the  RECORD  entry 

statement. 


Open  dictionary  with  ?DICT 
command,  then  delete  all  other 
EECORD  and  DEI   entries  that  refe- 
rence the  DEF,  delete  the  DEF, 
recompile  the  edited  DEF,  and, 
finally,  recompile  the  DEF  and 
EECORE  statements  that 
reference  the  DEF. 

Open  dictionary  with  ?DICT 
and  recompile  edited 
RECORD  statement. 

Cpen  dictionary  with  ?DICT 
and  delete  the  RECORD  with 
the  DELETE  statement.  Then 
modify  any  DEF  entries  that  need 
to  be  changed,  and  finally, 
recompile  the  new  record  statenent, 
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